sanger-pathogens / ariba

Antimicrobial Resistance Identification By Assembly
http://sanger-pathogens.github.io/ariba/
Other
167 stars 53 forks source link

test data seem to need refresh with bowtie2 2.5.0 #329

Closed emollier closed 11 months ago

emollier commented 1 year ago

Hi,

While trying to keep ariba in the upcoming release of Debian, I noticed at some point during discussion in Debian bug #1021675 that part of the test suite might need refresh with the newer release of bowtie2 2.5.0. Here below are the relevant errors from the test suite:

_________________________ TestMapping.test_run_bowtie2 _________________________
self = <ariba.tests.mapping_test.TestMapping testMethod=test_run_bowtie2>
    def test_run_bowtie2(self):
        '''Test run_bowtie2 unsorted'''
        self.maxDiff = None
        ref = os.path.join(data_dir, 'mapping_test_bowtie2_ref.fa')
        reads1 = os.path.join(data_dir, 'mapping_test_bowtie2_reads_1.fq')
        reads2 = os.path.join(data_dir, 'mapping_test_bowtie2_reads_2.fq')
        out_prefix = 'tmp.out.bowtie2'
        mapping.run_bowtie2(
            reads1,
            reads2,
            ref,
            out_prefix,
            bowtie2=extern_progs.exe('bowtie2'),
            bowtie2_version=extern_progs.version('bowtie2'),
        )
        expected = get_sam_columns(os.path.join(data_dir, 'mapping_test_bowtie2_unsorted.bam'))
        got = get_sam_columns(out_prefix + '.bam')
>       self.assertListEqual(expected, got)
E       AssertionError: Lists differ: [('1'[508 chars]5', 99, 'ref', 0, [(4, 2), (0, 23)], 'CCTCCACA[346 chars]AT')] != [('1'[508 chars]5', 97, 'ref', 0, [(4, 2), (0, 23)], 'CCTCCACA[346 chars]AT')]
E       
E       First differing element 8:
E       ('5', 99, 'ref', 0, [(4, 2), (0, 23)], 'CCTCCACAGGATGGTGGTATACCTG')
E       ('5', 97, 'ref', 0, [(4, 2), (0, 23)], 'CCTCCACAGGATGGTGGTATACCTG')
E       
E         [('1', 99, 'ref', 0, [(4, 5), (0, 20)], 'AGCCCTCCACAGGATGGTGGTATAC'),
E          ('1', 147, 'ref', 30, [(0, 25)], 'AGGATACAGATCTTGTGGGAAAGGT'),
E          ('2', 99, 'ref', 124, [(0, 25)], 'TAATGTTCTTAGGGCTTACCATAGA'),
E          ('2', 147, 'ref', 170, [(0, 20), (4, 5)], 'TCCACCTTAGCTAAGCGCAGACTCG'),
E          ('3', 73, 'ref', 86, [(0, 25)], 'TCGGGTCTGTACAAGGACGGATGGT'),
E          ('3', 133, None, 86, [], 'CGTACTGACTGACTGACGTACTGCA'),
E          ('4', 99, 'ref', 55, [(0, 25)], 'CCGCCGGGAAGTCCTTCTGTCGTGC'),
E          ('4', 147, 'ref', 136, [(0, 25)], 'GGCTTACCATAGAGGTACACTAAAA'),
E       -  ('5', 99, 'ref', 0, [(4, 2), (0, 23)], 'CCTCCACAGGATGGTGGTATACCTG'),
E       ?         ^
E       
E       +  ('5', 97, 'ref', 0, [(4, 2), (0, 23)], 'CCTCCACAGGATGGTGGTATACCTG'),
E       ?         ^
E       
E       -  ('5', 147, 'ref', 166, [(0, 24), (4, 1)], 'TTCATCCACCTTAGCTAAGCGCAGA'),
E       ?          ^
E       
E       +  ('5', 145, 'ref', 166, [(0, 24), (4, 1)], 'TTCATCCACCTTAGCTAAGCGCAGA'),
E       ?          ^
E       
E          ('6', 77, None, -1, [], 'CAGTTGCATGACGTCATGCAGTCAT'),
E          ('6', 141, None, -1, [], 'AATGAGTATGATGAGTAATGGTATG'),
E       -  ('7', 99, 'ref', 56, [(4, 1), (0, 23), (4, 1)], 'ACGCCGGGAAGTCCTTCTGTCGTGT'),
E       ?         ^
E       
E       +  ('7', 97, 'ref', 56, [(4, 1), (0, 23), (4, 1)], 'ACGCCGGGAAGTCCTTCTGTCGTGT'),
E       ?         ^
E       
E       -  ('7', 147, 'ref', 136, [(0, 24), (4, 1)], 'GGCTTACCATAGAGGTACACTAAAT')]
E       ?          ^
E       
E       +  ('7', 145, 'ref', 136, [(0, 24), (4, 1)], 'GGCTTACCATAGAGGTACACTAAAT')]
E       ?          ^
ariba/tests/mapping_test.py:62: AssertionError
----------------------------- Captured stderr call -----------------------------
[E::idx_find_and_load] Could not retrieve index file for '/builds/med-team/ariba/debian/output/source_dir/.pybuild/cpython3_3.10_ariba/build/ariba/tests/data/mapping_test_bowtie2_unsorted.bam'
[E::idx_find_and_load] Could not retrieve index file for 'tmp.out.bowtie2.bam'
____________________ TestMapping.test_run_bowtie2_and_sort _____________________
self = <ariba.tests.mapping_test.TestMapping testMethod=test_run_bowtie2_and_sort>
    def test_run_bowtie2_and_sort(self):
        '''Test run_bowtie2 sorted'''
        ref = os.path.join(data_dir, 'mapping_test_bowtie2_ref.fa')
        reads1 = os.path.join(data_dir, 'mapping_test_bowtie2_reads_1.fq')
        reads2 = os.path.join(data_dir, 'mapping_test_bowtie2_reads_2.fq')
        out_prefix = 'tmp.out.bowtie2'
        mapping.run_bowtie2(
            reads1,
            reads2,
            ref,
            out_prefix,
            sort=True,
            bowtie2=extern_progs.exe('bowtie2'),
            bowtie2_version=extern_progs.version('bowtie2'),
        )
        expected = get_sam_columns(os.path.join(data_dir, 'mapping_test_bowtie2_sorted.bam'))
        got = get_sam_columns(out_prefix + '.bam')
>       self.assertListEqual(expected, got)
E       AssertionError: Lists differ: [('1'[67 chars]5', 99, 'ref', 0, [(4, 2), (0, 23)], 'CCTCCACA[787 chars]TG')] != [('1'[67 chars]5', 97, 'ref', 0, [(4, 2), (0, 23)], 'CCTCCACA[787 chars]TG')]
E       
E       First differing element 1:
E       ('5', 99, 'ref', 0, [(4, 2), (0, 23)], 'CCTCCACAGGATGGTGGTATACCTG')
E       ('5', 97, 'ref', 0, [(4, 2), (0, 23)], 'CCTCCACAGGATGGTGGTATACCTG')
E       
E       Diff is 1363 characters long. Set self.maxDiff to None to see it.
ariba/tests/mapping_test.py:105: AssertionError
----------------------------- Captured stderr call -----------------------------
[E::idx_find_and_load] Could not retrieve index file for '/builds/med-team/ariba/debian/output/source_dir/.pybuild/cpython3_3.10_ariba/build/ariba/tests/data/mapping_test_bowtie2_sorted.bam'

My guess is the reference test data might need update, and disable these specific test items for the time being. I believe you might like to be aware of the issue. Please if there is a bigger deal than just test data refresh, then don't hesitate to ping.

Have a nice day, :) Étienne.

martinghunt commented 1 year ago

Thanks for this. What's the timeline for Debian?

This looks to me like bowtie2 behaviour has changed in an unexpected way. Those changes of 99->97 and 147->145 are significant and may well break parts of ariba. I'd need to look into it in more detail to figure out the effects, which I don't really have time to do until January.

martinghunt commented 1 year ago

... although after a brief look ... is this the only failing test? It's using artificially short reads, which you wouldn't normally get with real data. So even if bowtie2 has now changed, maybe it doesn't matter. The end to end tests where bowtie2 gets run are using longer reads. So maybe for now it is ok to disable that failing test if it is the only one that's failing.

emollier commented 1 year ago

Thanks for this. What's the timeline for Debian?

Debian 12 gradual freeze is due to start on January 12th 2023 and will reach hard freeze on March 12th (after which point, full freeze and actual release dates will still be yet to be announced), but if there is any serious bug in ariba, it will be still possible to make targeted changes.

is this the only failing test?

I reviewed through previous existing Debian patches to check for potentially other failing tests and, in addition to the issue at play, I see:

emollier commented 1 year ago
  • from the same patch, samtools_variants_test.py had test_make_vcf_and_depths_files and test_get_depths_at_position skipped with another case of off-by-two, so if there is a relevant test failure, that would probably be there; thinking twice about this one, I begin to question whether I recall correctly it appeared during the htslib transition, or later possibly at the same moment the bowtie2 tests started to fail;

Hmn, nevermind that, I badly labelled the skip in the patch, mixing up with the bowtie2 error. I found these test failures as they were before the bowtie2 tests failures happened in the build log attached to my comment in issue #327.

Sorry for my confusion.

emollier commented 11 months ago

Working on updating ariba to version 2.14.7, the affected tests are now back on tracks.