pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
773 stars 274 forks source link

Several tests fail: 2 failed, 155 errors #1284

Open yurivict opened 4 months ago

yurivict commented 4 months ago
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

directory = '/usr/ports/biology/py-pysam/work-py39/pysam-0.22.1/tests/tabix_data'

    def make_data_files(directory):
        what = None
        try:
            if not os.path.exists(os.path.join(directory, "all.stamp")):
                subprocess.check_output(["make", "-C", directory], stderr=subprocess.STDOUT)
        except subprocess.CalledProcessError as e:
            what = "Making test data in '%s' failed:\n%s" % (directory, force_str(e.output))

        if what is not None:
>           raise RuntimeError(what)
E           RuntimeError: Making test data in '/usr/ports/biology/py-pysam/work-py39/pysam-0.22.1/tests/tabix_data' failed:
E           make: Entering directory '/usr/ports/biology/py-pysam/work-py39/pysam-0.22.1/tests/tabix_data'
E           tabix -p bed empty.bed.gz
E           [tabix] the index file exists. Please use '-f' to overwrite.
E           make: *** [Makefile:37: empty.bed.gz.tbi] Error 1
E           make: Leaving directory '/usr/ports/biology/py-pysam/work-py39/pysam-0.22.1/tests/tabix_data'

TestUtils.py:250: RuntimeError
_________________________________________________________________ ERROR at setup of TestBackwardsCompatibility.testVCF0v23 __________________________________________________________________
[gw2] freebsd14 -- Python 3.9.18 /usr/local/bin/python3.9

    def setUpModule():
>       make_data_files(TABIX_DATADIR)

tabix_test.py:20: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

directory = '/usr/ports/biology/py-pysam/work-py39/pysam-0.22.1/tests/tabix_data'

    def make_data_files(directory):
        what = None
        try:
            if not os.path.exists(os.path.join(directory, "all.stamp")):
                subprocess.check_output(["make", "-C", directory], stderr=subprocess.STDOUT)
        except subprocess.CalledProcessError as e:
            what = "Making test data in '%s' failed:\n%s" % (directory, force_str(e.output))

        if what is not None:
>           raise RuntimeError(what)
E           RuntimeError: Making test data in '/usr/ports/biology/py-pysam/work-py39/pysam-0.22.1/tests/tabix_data' failed:
E           make: Entering directory '/usr/ports/biology/py-pysam/work-py39/pysam-0.22.1/tests/tabix_data'
E           tabix -p bed empty.bed.gz
E           [tabix] the index file exists. Please use '-f' to overwrite.
E           make: *** [Makefile:37: empty.bed.gz.tbi] Error 1
E           make: Leaving directory '/usr/ports/biology/py-pysam/work-py39/pysam-0.22.1/tests/tabix_data'

TestUtils.py:250: RuntimeError
____________________________________________________________________ ERROR at setup of VCFFromVCFTest_24.testConnecting _____________________________________________________________________

========================================================================================= FAILURES ==========================================================================================
____________________________________________________________________________________ TestIO.testSAM2SAM _____________________________________________________________________________________
[gw1] freebsd14 -- Python 3.9.18 /usr/local/bin/python3.9

self = <AlignmentFile_test.TestIO testMethod=testSAM2SAM>

    def testSAM2SAM(self):
>       self.checkEcho("ex2.sam",
                       "ex2.sam",
                       "tmp_ex2.sam",
                       "r", "wh")

AlignmentFile_test.py:455: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <AlignmentFile_test.TestIO testMethod=testSAM2SAM>, input_filename = 'ex2.sam', reference_filename = 'ex2.sam', output_filename = 'tmp_ex2.sam', input_mode = 'r', output_mode = 'wh'
sequence_filename = None, use_template = True, checkf = <function checkBinaryEqual at 0x306eb534a820>, kwargs = {}, infile = <pysam.libcalignmentfile.AlignmentFile object at 0x306eb86430d0>
outfile = <pysam.libcalignmentfile.AlignmentFile object at 0x306eb8643160>, iter = <pysam.libcalignmentfile.IteratorRowAll object at 0x306eb7bb7dc0>
x = <pysam.libcalignedsegment.AlignedSegment object at 0x306eb7bb7f40>

    def checkEcho(self,
                  input_filename,
                  reference_filename,
                  output_filename,
                  input_mode,
                  output_mode,
                  sequence_filename=None,
                  use_template=True,
                  checkf=checkBinaryEqual,
                  **kwargs):
        '''iterate through *input_filename* writing to
        *output_filename* and comparing the output to
        *reference_filename*.

        The files are opened according to the *input_mode* and
        *output_mode*.

        If *use_template* is set, the header is copied from infile
        using the template mechanism, otherwise target names and
        lengths are passed explicitly.

        The *checkf* is used to determine if the files are
        equal.
        '''

        with pysam.AlignmentFile(
                os.path.join(BAM_DATADIR, input_filename),
                input_mode) as infile:

            if "b" in input_mode:
                self.assertTrue(infile.is_bam)
                self.assertFalse(infile.is_cram)
            elif "c" in input_mode:
                self.assertFalse(infile.is_bam)
                self.assertTrue(infile.is_cram)
            else:
                self.assertFalse(infile.is_cram)
                self.assertFalse(infile.is_bam)

            if use_template:
                outfile = pysam.AlignmentFile(
                    output_filename,
                    output_mode,
                    reference_filename=sequence_filename,
                    template=infile, **kwargs)
            else:
                outfile = pysam.AlignmentFile(
                    output_filename,
                    output_mode,
                    reference_names=infile.references,
                    reference_lengths=infile.lengths,
                    reference_filename=sequence_filename,
                    add_sq_text=False,
                    **kwargs)

            iter = infile.fetch()

            for x in iter:
                outfile.write(x)

            outfile.close()

        self.assertTrue(checkf(
            os.path.join(BAM_DATADIR, reference_filename),
            output_filename),
            "files %s and %s are not the same" %
            (reference_filename,
             output_filename))

>       os.unlink(output_filename)
E       FileNotFoundError: [Errno 2] No such file or directory: 'tmp_ex2.sam'

AlignmentFile_test.py:452: FileNotFoundError
____________________________________________________________________________________ TestIO.testBAM2CRAM ____________________________________________________________________________________
[gw2] freebsd14 -- Python 3.9.18 /usr/local/bin/python3.9

self = <AlignmentFile_test.TestIO testMethod=testBAM2CRAM>

    def testBAM2CRAM(self):
        # ignore header (md5 sum)
>       self.checkEcho("ex2.bam",
                       "ex2.cram",
                       "tmp_ex2.cram",
                       "rb", "wc",
                       sequence_filename=os.path.join(BAM_DATADIR, "ex1.fa"),
                       checkf=partial(
                           check_samtools_view_equal,
                           without_header=True))

AlignmentFile_test.py:502: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <AlignmentFile_test.TestIO testMethod=testBAM2CRAM>, input_filename = 'ex2.bam', reference_filename = 'ex2.cram', output_filename = 'tmp_ex2.cram', input_mode = 'rb'
output_mode = 'wc', sequence_filename = '/usr/ports/biology/py-pysam/work-py39/pysam-0.22.1/tests/pysam_data/ex1.fa', use_template = True
checkf = functools.partial(<function check_samtools_view_equal at 0x3953a9f4a940>, without_header=True), kwargs = {}
infile = <pysam.libcalignmentfile.AlignmentFile object at 0x3953ac7715e0>, outfile = <pysam.libcalignmentfile.AlignmentFile object at 0x3953ac771670>
iter = <pysam.libcalignmentfile.IteratorRowAllRefs object at 0x3953ac7595f0>, x = <pysam.libcalignedsegment.AlignedSegment object at 0x3953ac73e340>

    def checkEcho(self,
                  input_filename,
                  reference_filename,
                  output_filename,
                  input_mode,
                  output_mode,
                  sequence_filename=None,
                  use_template=True,
                  checkf=checkBinaryEqual,
                  **kwargs):
        '''iterate through *input_filename* writing to
        *output_filename* and comparing the output to
        *reference_filename*.

        The files are opened according to the *input_mode* and
        *output_mode*.

        If *use_template* is set, the header is copied from infile
        using the template mechanism, otherwise target names and
        lengths are passed explicitly.

        The *checkf* is used to determine if the files are
        equal.
        '''

        with pysam.AlignmentFile(
                os.path.join(BAM_DATADIR, input_filename),
                input_mode) as infile:

            if "b" in input_mode:
                self.assertTrue(infile.is_bam)
                self.assertFalse(infile.is_cram)
            elif "c" in input_mode:
                self.assertFalse(infile.is_bam)
                self.assertTrue(infile.is_cram)
            else:
                self.assertFalse(infile.is_cram)
                self.assertFalse(infile.is_bam)

            if use_template:
                outfile = pysam.AlignmentFile(
                    output_filename,
                    output_mode,
                    reference_filename=sequence_filename,
                    template=infile, **kwargs)
            else:
                outfile = pysam.AlignmentFile(
                    output_filename,
                    output_mode,
                    reference_names=infile.references,
                    reference_lengths=infile.lengths,
                    reference_filename=sequence_filename,
                    add_sq_text=False,
                    **kwargs)

            iter = infile.fetch()

            for x in iter:
                outfile.write(x)

            outfile.close()

        self.assertTrue(checkf(
            os.path.join(BAM_DATADIR, reference_filename),
            output_filename),
            "files %s and %s are not the same" %
            (reference_filename,
             output_filename))

>       os.unlink(output_filename)
E       FileNotFoundError: [Errno 2] No such file or directory: 'tmp_ex2.cram'

AlignmentFile_test.py:452: FileNotFoundError

Version: 0.22.1 Python-3.9 FreeBSD 14.0

jmarshall commented 3 months ago

When making the test data, tabix -p bed empty.bed.gz has failed because empty.bed.gz.tbi already exists. The subsequent errors are because other test data files do not exist due to this make failure.

You do not say what you have done to investigate why empty.bed.gz.tbi already exists or why the Makefile rules do not avoid rebuilding it. This code has not changed substantially for the last several releases, and it works for me on FreeBSD 14.

You do not say whether you have run these tests successfully previously or whether this is a newly encountered problem.

The main scenario I can think of in which this file could exist when tabix is run but not exist when make evaluates dependencies is if you are running the Python tests in parallel and multiple makes are trying to create the test data at once. You do not say whether you are running tests in parallel.

43c10664834cf3914000a744e6057826c6a6fa65 adds --force to avoid this particular problem, but there are surely also other race conditions in these makefiles. While it would be possible to recode test data creation to create the final outputs atomically or to prevent parallel makes by using a file system mutex, IMHO that's not really worthwhile — really each subdirectory's test data creation is just not parallelisable.

If the Python tests are to be run in parallel, the recommended approach would be to pre-create the test data beforehand by running make once for each directory:

make -C tests/pysam_data  # or gmake as appropriate
make -C tests/cbcf_data
make -C tests/tabix_data