pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
773 stars 274 forks source link

Prebuilt wheels do not work on FIPS-compliant systems #1276

Open shaze opened 4 months ago

shaze commented 4 months ago

On systems which are FIPS compliant, hashlib.md5 has to have the explicit optional argument useforscurity=False otherwise pysam cannot be imported.

Please could the calls on lines 173 and 175 of devtools/import.py be amended by adding this, e.g. nos that the code reads,

            md5_old = hashlib.md5(
                "".join(open(old_file, "r", encoding="utf-8").readlines()).encode(),usedforsecurity=False).digest()
            md5_new = hashlib.md5(
        "".join(open(src, "r", encoding="utf-8").readlines()).encode(),usedforsecurity=False).digest()
jmarshall commented 4 months ago

The clue is in the name: devtools/import.py is used only during pysam development, it is not used when building pysam, and it does not exist in the sdist or built wheels. So I do not understand how text inside this file could affect the importation of pysam when installed from e.g. an sdist or a wheel.

That comparison code is largely pointless — it is unused in the usual way that import.py gets used — and will eventually disappear as import.py is refactored. If your FIPS-compliant system would like that to happen more quickly, you'll have to explain why unused code in a non-installed infrastructure source file makes a difference.

shaze commented 4 months ago

I can't answer your question why the error manifests, but it does and arguably a call to md5 which is not for security purposes should explicitly say so.,

Apologies for not including this error trace before. I'm using python3.9 here since its the default python3 on Rocky 9 but I found the same in other versions of python

First to show that it's a standard installation of python3 and pysam

scott@n01 ~ % python3
Python 3.9.18 (main, Jan  4 2024, 00:00:00)
[GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pysam
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pysam'
>>>

So install using pip

scott@n01 ~ % pip3.9 install pysam
Defaulting to user installation because normal site-packages is not writeable
Collecting pysam
  Using cached pysam-0.22.0-cp39-cp39-manylinux_2_28_x86_64.whl (24.2 MB)
Installing collected packages: pysam
Successfully installed pysam-0.22.0

and now try to run and import

scott@n01 ~ % python3
Python 3.9.18 (main, Jan  4 2024, 00:00:00)
[GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pysam
crypto/fips/fips.c:154: OpenSSL internal error: FATAL FIPS SELFTEST FAILURE
[1]    64924 IOT instruction (core dumped)  python3

Ahashlib.md5 call without the usedforsecurity flag explicitly set causes this error . I then cloned the repo, made the changes I suggested run setup.py and installed and it works

scott@n01 ~ % /opt/exp_soft/python310-14/bin/python3
Python 3.10.14 (main, Apr  2 2024, 12:59:02) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pysam
>>>
>>>

I perhaps should have rephrased my request -- since I have a solution that works for me, this is certainly not an urgent issue for me -- feel free to close or prioritise as you see fit.

jmarshall commented 4 months ago

The devtools/import.py change turns out to be irrelevant. The reason why the last one worked is that you built it yourself against the FIPS-enabled libraries installed on your Rocky Linux machine. If you built from the pristine repo with unchanged scripts, I am certain that that would work too.

The problem with the previous installation was that you installed from a binary wheel that was not built on your system:

Collecting pysam
  Using cached pysam-0.22.0-cp39-cp39-manylinux_2_28_x86_64.whl (24.2 MB)

Wheels of Cython projects such as pysam contain copies of the system shared libraries that they were compiled against, so that they stand a chance of working on a diverse range of Linuxes. (I myself am unconvinced that this is a great solution in general, but this is how wheels work.)

The FATAL FIPS SELFTEST FAILURE error is caused by the non-FIPS …/pysam-libs/lib{crypto,ssl}* libraries distributed within the wheel. If you install from the wheel and then replace those libraries (and …/pysam-libs/libcurl*) with links to your system ones, the wheel-based pysam installation will work. (But I would not recommend using such a chimera for production purposes!)

Hence this issue is much the same as #1097. Like that one, the problem is really with the way wheels work and is not specific to pysam.

It may be possible to build a wheel against a FIPS-compliant manylinux-esque image and thus provide a workable prebuilt wheel for such systems. Someone interested in this may wish to investigate the possibilities.

Otherwise I think the recommended way to install pysam on a FIPS-compliant system would be to ensure that you build it locally from source, generally with

pip install --no-binary :all: pysam
shaze commented 4 months ago

Thank you for your very detailed and thoughtful explanation -- that's very helpful. This problem is definitely not specific to pysam -- I came to pysam because of a dependancy from a downstream package that I tried to install.

I can confirm that if rebuild from scratch without the changes, it works. I think I'd assumed that the fix I suggested was needed elsewhere it was also needed here.

Thanks again