pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
782 stars 274 forks source link

Failures on Cython 3 #1179

Closed SoapGentoo closed 1 year ago

SoapGentoo commented 1 year ago

@jmarshall We've been testing the new Cython 3, and are getting failures with pysam. Some of these errors look pretty scary:

Error compiling Cython file:
------------------------------------------------------------
...

            src = self._delegate

            # as the sequence is stored in half-bytes, the total length (sequence
            # plus quality scores) is (l+1)/2 + l
            nbytes_new = (l + 1) / 2 + l
                                     ^
------------------------------------------------------------

pysam/libcalignedsegment.pyx:1426:37: Cannot assign type 'double' to 'Py_ssize_t'

Do you think getting a fix in for 0.21.0 would be possible?

jmarshall commented 1 year ago

Pysam has only recently dropped Python 2.7, and has not yet taken full advantage of that. In particular, the Cython code still does not declare language_level so until now has defaulted to 2. (I have been waiting until a new release cycle to start that transition, and your log suggests this has been wise.)

I see with this Cython 3 beta the language_level now defaults to a variant of 3. All the Cython errors I spotted in your log are similar, and can likely be fixed by using // division instead of old-style integer /. So this may not be too hard to fix. (But there may be other more subtle incompatibilities.)

Nonetheless Cython 3 has not yet been released, and there are several groups of users for whom pysam is currently non-functional who are awaiting this fixed release. So my opinion FWIW (I am not the lead maintainer) is that we should get this release out now and start this *.pyx code language_level migration destabilisation in the next release cycle. It may be appropriate to follow up with a fairly quick 0.21.1 release containing the results of that code migration, but it will be wise to also have a recent pre-migration release.

SoapGentoo commented 1 year ago

Ignore my noise, I just saw it was fixed in 8676a8cd16b73359cc43ab914b07da9cf5038f14 already

jmarshall commented 1 year ago

Or I suppose we could explicitly set language level 2 for now, so that Cython 3 would interpret the source code the same way as earlier Cythons, getting rid of these compilation failures.

@AndreasHeger: It may be worth slipping that into this release.

SoapGentoo commented 1 year ago

https://github.com/pysam-developers/pysam/commit/8676a8cd16b73359cc43ab914b07da9cf5038f14#diff-981173f36fd5aaf7f64991fde616adf4f933323f7a981919ba53ac28ebd711e2R1 :wink:

jmarshall commented 1 year ago

Ah, we’ve already bedded it in during this release cycle then. That’s good news, with any luck.

SoapGentoo commented 1 year ago

I've added 0.21.0 and happened to test it with Cython 3.0.0_beta2 and got one failure:

========================================================================================== FAILURES ==========================================================================================
__________________________________________________________________________________ test_set_sample_alleles ___________________________________________________________________________________

vcf_header = <pysam.libcbcf.VariantHeader object at 0x7f2c69e1b3d0>

    def test_set_sample_alleles(vcf_header):
        vcf_header.formats.add('GT',1,'String',"Genotype") # id, number, type, description
        record = vcf_header.new_record(
            contig="1",
            start=20,
            stop=21,
            alleles=('A','T')
            )

        record.samples['sample1'].alleles = ('T', 'A')
        assert record.samples['sample1'].alleles  == ('T','A')

        # Empty record:
        record.samples['sample1'].alleles = (None, )
        assert record.samples['sample1'].alleles   == tuple()
>       record.samples['sample1'].alleles = None
E       TypeError: Argument 'value' has incorrect type (expected tuple, got NoneType)

record     = <pysam.libcbcf.VariantRecord object at 0x7f2c69dad9d0>
vcf_header = <pysam.libcbcf.VariantHeader object at 0x7f2c69e1b3d0>

tests/VariantRecord_test.py:85: TypeError

which doesn't happen with Cython 0.29.34.

jmarshall commented 1 year ago

This is a type checking error; apparently beta2 propagates type annotations more carefully or at least differently.

This type annotation was added in #1122 (appearing in 0.20.0) and is not quite correct — it should be Optional[tuple] or so. For …reasons…, we have type annotations in separate *.pyi files and the annotation there is correct. And this is the only type annotation directly in the *.pyx code. So please try this with Cython beta2 with this patch:

diff --git a/pysam/libcbcf.pyx b/pysam/libcbcf.pyx
index 8c088af..8ecfe5f 100644
--- a/pysam/libcbcf.pyx
+++ b/pysam/libcbcf.pyx
@@ -3479,7 +3479,7 @@ cdef class VariantRecordSample(object):
         return bcf_format_get_alleles(self)

     @alleles.setter
-    def alleles(self, value: tuple):
+    def alleles(self, value):
         # Sets the genotype, supply a tuple of alleles to set.
         # The supplied alleles need to be defined in the correspoding pysam.libcbcf.VariantRecord
         # The genotype is reset when an empty tuple, None or (None,) is supplied
SoapGentoo commented 1 year ago

Thanks, can confirm this patch works both under 0.29.34 and 3.0.0_beta2.