levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

Pandas Version Checking Fails #118

Closed jmmitc06 closed 10 months ago

jmmitc06 commented 10 months ago

I am enountering this issue with pyteomics indirectly through matchms. In short, it appears that the version checking performed for pandas in /pyteomics/auxillary/patch.py is no longer valid for my pandas version (2.1.0).

The following error is produced when import matchms:

Traceback (most recent call last):
  File "/Users/mitchjo/Projects/PythonCentricPipelineForMetabolomics-1/./matchms_test.py", line 1, in <module>
    from matchms.importing import load_from_mgf, load_from_mzml
  File "/opt/homebrew/lib/python3.11/site-packages/matchms/__init__.py", line 1, in <module>
    from . import exporting, filtering, importing, networking, plotting, similarity
  File "/opt/homebrew/lib/python3.11/site-packages/matchms/exporting/__init__.py", line 9, in <module>
    from .save_as_mgf import save_as_mgf
  File "/opt/homebrew/lib/python3.11/site-packages/matchms/exporting/save_as_mgf.py", line 2, in <module>
    import pyteomics.mgf as py_mgf
  File "/opt/homebrew/lib/python3.11/site-packages/pyteomics/mgf.py", line 76, in <module>
    from . import auxiliary as aux
  File "/opt/homebrew/lib/python3.11/site-packages/pyteomics/auxiliary/__init__.py", line 6, in <module>
    from .patch import Version as _Version
  File "/opt/homebrew/lib/python3.11/site-packages/pyteomics/auxiliary/patch.py", line 14, in <module>
    pv = pd.version.version
         ^^^^^^^^^^
AttributeError: module 'pandas' has no attribute 'version'

I believe this is because a of at least pandas 2.1.0 there is no longer a _version module-level datamember storing the version information nor is there a version datamember either. An appropriate replacement appears to be pd.__version__; however, I'm not sure if this is valid for older versions of pandas and I'm not sure how far back you would want to be backwards compatible.

I'm happy to provide a fix for this since I will have to fix it on my end to complete my project and it should be straightforward. Once I get my workaround, I would be happy to submit the PR.

levitsky commented 10 months ago

Thank you for reporting!

This check is for pandas older than 0.17, which is now 8 years old, so I guess we can just make sure pandas is newer than that through setup.py and not support older versions.

I can make the change shortly.

jmmitc06 commented 10 months ago

Sounds good. Yeah, that version of pandas is ancient. I'm surprised it hasn't been a problem previously.

Due to permissions, I could not create the PR but here is what I did locally to fix the problem:

from .structures import PyteomicsError

try:
    from packaging.version import Version
except ImportError:
    from distutils.version import LooseVersion as Version

try:
    import pandas as pd
except ImportError:
    pd = None
else:
    if hasattr(pd, '_version'):
        pv = pd._version.get_versions()['version']
    elif hasattr(pd, 'version'):
        pv = pd.version.version
    elif hasattr(pd, '__version__'):
        pv = pd.__version__
    else:
        raise PyteomicsError()
    if Version(pv) < Version('0.17'):
        pd.DataFrame.sort_values = pd.DataFrame.sort
levitsky commented 10 months ago

The current master version shouldn't do that check at all. (I need to resolve unrelated issues with the test suite but it should work.)