sgkit-dev / sgkit

Scalable genetics toolkit
https://sgkit-dev.github.io/sgkit
Apache License 2.0
217 stars 32 forks source link

ImportError: dlopen[...]: symbol not found in flat namespace '_bcf_float_missing' - sgkit-vcf requirements are not installed #1220

Closed simonharnqvist closed 1 month ago

simonharnqvist commented 1 month ago

Hi,

I'm trying to migrate from scikit-allel to sgkit, but I'm running into issues with the sgkit.io.vcf module. It looks like a cyvcf2 bug (or some C compiler config problem with my environments?), but raising the issue here first. Please let me know if there's anything else I can do to troubleshoot on my end.

Installed from PyPI

When I try to import e.g. from sgkit.io.vcf import vcf_to_zarr or just from sgkit.io import vcf, I get the following error:

    "stack": "---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
File ~/mambaforge/envs/sgkit/lib/python3.11/site-packages/sgkit/io/vcf/__init__.py:4
      3 try:
----> 4     from .vcf_partition import partition_into_regions
      5     from .vcf_reader import (
      6         FloatFormatFieldWarning,
      7         MaxAltAllelesExceededWarning,
   (...)
     12         zarr_array_sizes,
     13     )

File ~/mambaforge/envs/sgkit/lib/python3.11/site-packages/sgkit/io/vcf/vcf_partition.py:6
      5 import numpy as np
----> 6 from cyvcf2 import VCF
      8 from sgkit.io.vcf.csi import CSI_EXTENSION, read_csi

File ~/mambaforge/envs/sgkit/lib/python3.11/site-packages/cyvcf2/__init__.py:1
----> 1 from .cyvcf2 import (VCF, Variant, Writer, r_ as r_unphased, par_relatedness,
      2                      par_het)
      3 Reader = VCFReader = VCF

ImportError: dlopen(/Users/s2341012/mambaforge/envs/sgkit/lib/python3.11/site-packages/cyvcf2/cyvcf2.cpython-311-darwin.so, 0x0002): symbol not found in flat namespace '_bcf_float_missing'

The above exception was the direct cause of the following exception:

ImportError                               Traceback (most recent call last)
Cell In[12], line 1
----> 1 from sgkit.io import vcf

File ~/mambaforge/envs/sgkit/lib/python3.11/site-packages/sgkit/io/vcf/__init__.py:40
     34 else:
     35     msg = (
     36         \"sgkit-vcf requirements are not installed.\
\
\"
     37         \"Please install them via pip :\
\
\"
     38         \"  pip install 'sgkit[vcf]'\"
     39     )
---> 40 raise ImportError(str(e) + \"\
\
\" + msg) from e

ImportError: dlopen(/Users/s2341012/mambaforge/envs/sgkit/lib/python3.11/site-packages/cyvcf2/cyvcf2.cpython-311-darwin.so, 0x0002): symbol not found in flat namespace '_bcf_float_missing'

sgkit-vcf requirements are not installed.

Please install them via pip :

  pip install 'sgkit[vcf]'"
}

I installed sgkit from PyPI. Versions of key packages:

cyvcf2                    0.30.28                  pypi_0    pypi
numpy                     1.26.4                   pypi_0    pypi
python                    3.11.9          h932a869_0_cpython    conda-forge
sgkit                     0.8.0                    pypi_0    pypi

sgkit-vcf requirements are actually already installed - pip install 'sgkit[vcf]' has no effect.

From conda-forge

I overwrote the previous env and installed with mamba create -n sgkit sgkit. This did not install cyvcf2, which I had to install via pip. Once installed, I got the exact same error as above. Packages:

cyvcf2                    0.30.28                  pypi_0    pypi
numpy                     1.26.4          py310hd45542a_0    conda-forge
python                    3.10.14         h2469fbe_0_cpython    conda-forge
sgkit                     0.8.0              pyhd8ed1ab_0    conda-forge

Other details

Hardware: Apple Macbook Pro M2 2022 OS: macOS Sonoma 14.5

Running through a Jupyter notebook (jupyter_core 5.7.2) in VSCode 1.89.1.

jeromekelleher commented 1 month ago

I'm having the same issue with cyvcf2 - I'm pretty sure it's a problem with the packaging of the cyvcf2 binary wheels.

jeromekelleher commented 1 month ago

Note: the VCF parsing code here in sgkit will be deprecated soon in favour of using vcf2zarr which is almost ready for it's first public release

simonharnqvist commented 1 month ago

Sounds like I'll stick with scikit-allel for a while longer then - thanks @jeromekelleher for the quick response. I'll close this given impending deprecation.

jeromekelleher commented 1 month ago

Note it's just the vcf import we're going to deprecate @simonharnqvist , everything else after that will work the same. Would be great to get some input from a scikit-allel user's perspective.