matchms / ms2deepscore

Deep learning similarity measure for comparing MS/MS spectra with respect to their chemical similarity
Apache License 2.0
55 stars 25 forks source link

Issues installing in M1 conda environment #219

Closed j-berg closed 2 months ago

j-berg commented 4 months ago

Hi all,

I don't know if this is just occurring on my machine, but just wanted to make you aware of an issue I came across trying to install m2deepscore on my M1 mac.

What doesn't work

README environment install method # 1

conda create --name ms2deepscore python=3.9
conda activate ms2deepscore
pip install ms2deepscore --no-cache  #added no-cache just to be safe
Building wheels for collected packages: lxml, pubchempy
  Building wheel for lxml (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [121 lines of output]
      Building lxml version 4.9.4.
      /private/var/folders/c9/qw2ksz8x7335flmfxnq0kmkh0000gn/T/pip-install-wsxiw9ba/lxml_afdc6f0fae4d42abbc864b8b66c168b4/setup.py:67: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
        import pkg_resources
      Building without Cython.
...
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /Users/jberg/miniconda3/envs/ms2deepscore4/include -arch arm64 -I/Users/jberg/miniconda3/envs/ms2deepscore4/include -fPIC -O2 -isystem /Users/jberg/miniconda3/envs/ms2deepscore4/include -arch arm64 -DCYTHON_CLINE_IN_TRACEBACK=0 -Isrc -Isrc/lxml/includes -I/Users/jberg/miniconda3/envs/ms2deepscore4/include/python3.9 -c src/lxml/etree.c -o build/temp.macosx-11.1-arm64-cpython-39/src/lxml/etree.o -w -flat_namespace
      In file included from src/lxml/etree.c:96:
      /Users/jberg/miniconda3/envs/ms2deepscore4/include/python3.9/Python.h:25:10: fatal error: 'stdio.h' file not found
         25 | #include <stdio.h>
            |          ^~~~~~~~~
      1 error generated.
      Compile failed: command '/opt/homebrew/opt/llvm/bin/clang' failed with exit code 1
      creating var
      creating var/folders
      creating var/folders/c9
      creating var/folders/c9/qw2ksz8x7335flmfxnq0kmkh0000gn
      creating var/folders/c9/qw2ksz8x7335flmfxnq0kmkh0000gn/T
      cc -I/usr/include/libxml2 -c /var/folders/c9/qw2ksz8x7335flmfxnq0kmkh0000gn/T/xmlXPathInitlruittu1.c -o var/folders/c9/qw2ksz8x7335flmfxnq0kmkh0000gn/T/xmlXPathInitlruittu1.o
      cc var/folders/c9/qw2ksz8x7335flmfxnq0kmkh0000gn/T/xmlXPathInitlruittu1.o -lxml2 -o a.out
      error: command '/opt/homebrew/opt/llvm/bin/clang' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for lxml
  Running setup.py clean for lxml
  Building wheel for pubchempy (setup.py) ... done
  Created wheel for pubchempy: filename=PubChemPy-1.0.4-py3-none-any.whl size=13820 sha256=1ab57eb3c9e70d7f361ec2339eabda8f277be82dc990453f111cb0c1450660c9
  Stored in directory: /private/var/folders/c9/qw2ksz8x7335flmfxnq0kmkh0000gn/T/pip-ephem-wheel-cache-eh0n_d9b/wheels/84/45/0e/b597debba098119b642eaf728ae1883d23ad8ea2a9366f2ded
Successfully built pubchempy
Failed to build lxml
ERROR: Could not build wheels for lxml, which is required to install pyproject.toml-based projects

README environment install method # 3

conda create --name ms2deepscore python=3.9
conda activate ms2deepscore
conda install --channel bioconda --channel conda-forge matchms
pip install ms2deepscore --no-cache
...
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for lxml
  Running setup.py clean for lxml
Failed to build lxml
ERROR: Could not build wheels for lxml, which is required to install pyproject.toml-based projects

README environment install method # 1 with no Python version specified

conda create ms2deepscore python 
conda activate ms2deepscore 
pip install ms2deepscore --no-cache

A slightly different error:

Collecting numba (from ms2deepscore)
  Downloading numba-0.57.1.tar.gz (2.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 32.9 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [8 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/private/var/folders/c9/qw2ksz8x7335flmfxnq0kmkh0000gn/T/pip-install-yf5hqyqn/numba_eeb33b711ed649fda9f076ecdabb2167/setup.py", line 51, in <module>
          _guard_py_ver()
        File "/private/var/folders/c9/qw2ksz8x7335flmfxnq0kmkh0000gn/T/pip-install-yf5hqyqn/numba_eeb33b711ed649fda9f076ecdabb2167/setup.py", line 48, in _guard_py_ver
          raise RuntimeError(msg.format(cur_py, min_py, max_py))
      RuntimeError: Cannot install on Python version 3.12.4; only versions >=3.8,<3.12 are supported.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Along with several other combinations.

What does work

README environment install method # 1 with Python == 3.11

conda create --name ms2deepscore python=3.11
conda activate ms2deepscore
pip install ms2deepscore --no-cache

python3.11_environment.yml

No set Python version (v3.12 works here) and a custom setup.py

conda create --name ms2deepscore python 
conda install -c bioconda -c conda-forge matchms

And then I specifically need to download the repository (@v2.0.0) and remove any forced versioning for dependencies:

python_requires='>=3.9',
    install_requires=[
        "matchms",
        "numba",
        "numpy",
        "pandas",
        "scikit-learn",
        "torch",
        "tqdm",
        "matplotlib"
    ],

And then install via: pip install . (from within the local copy of the repo). I've attached the exported conda environment from my working setup: python3.12_environment.yml Although this method leads to issues later with numpy compilation as matchms is also forcing certain versions of dependencies.

Conclusion

It seems like the environment is very sensitive to different forced version requirements on certain dependencies.

niekdejonge commented 4 months ago

Thanks for the extensive issue! We have had quite some issues with making everything M1 chip compatible. In fact partly because of that we transitioned form tensorflow to pytorch (since 2.0). However, I was not aware that there were also installation issues with pytorch on M1.

Until recent it was impossible to automatically test installing on macos + M1 chip using github actions. However, I think they have released that earlier this year, so I will try to implement that. If I get that up and running I will add this to the readme and change recommended python version to 3.11.

niekdejonge commented 4 months ago

Trying to implement in #221 based on https://github.com/actions/runner-images/issues/9254

niekdejonge commented 4 months ago

Weirdly enough the python 3.11 seems to fail for me now... Did your tests run correctly for the installation? @j-berg

j-berg commented 4 months ago

Yeah, while all dependencies installed, I have been running into some finicky behavior downstream. I am trying to run:

from matchms import calculate_scores
from matchms.importing import load_from_msp
from ms2deepscore import MS2DeepScore
from ms2deepscore.models import load_model

# Import data
references = load_from_msp("./GNPS-LIBRARY.msp")
queries = load_from_msp("./ms2_spectra.msp")

# Load pretrained model
model = load_model("./ms2deepscore_model.pt")

similarity_measure = MS2DeepScore(model)

# Calculate scores and get matchms.Scores object
scores = calculate_scores(references, queries, similarity_measure)

But I am having some issues with the msp file formatting. I wasn't sure if that was just my input files though since some of them are custom generated. I'm currently trying to resolve and will let you know if I figure out a solution.

j-berg commented 4 months ago

I ran the tests and they seemed to work fine in my Python 3.11 environment (from what i can tell, just warnings but no errors):

(ms2deepscore) jberg@Jordans-Laptop ms2deepscore-2.0.0 % pytest
================================================= test session starts ==================================================
platform darwin -- Python 3.11.9, pytest-8.2.2, pluggy-1.5.0
rootdir: /Users/jberg/repositories/ms2deepscore-2.0.0
plugins: anyio-4.2.0
collected 106 items                                                                                                    

tests/test_MetadataFeatureGenerator.py ..........                                                                [  9%]
tests/test_SettingsMS2deepscore.py .....                                                                         [ 14%]
tests/test_calculate_tanimoto_scores_for_plotting.py ...                                                         [ 16%]
tests/test_data_generators.py .....                                                                              [ 21%]
tests/test_embedding_evaluator.py ........                                                                       [ 29%]
tests/test_loss_functions.py .........                                                                           [ 37%]
tests/test_ms2deepscore.py ....                                                                                  [ 41%]
tests/test_ms2deepscore_evaluated.py ...                                                                         [ 44%]
tests/test_ms2deepscoremontecarlo.py ....                                                                        [ 48%]
tests/test_plotting.py ......                                                                                    [ 53%]
tests/test_plotting_wrapper_functions.py ..                                                                      [ 55%]
tests/test_siamese_spectra_model.py ....                                                                         [ 59%]
tests/test_spectrum_pair_selection.py ..............                                                             [ 72%]
tests/test_train_ms2deepscore.py ..                                                                              [ 74%]
tests/test_training_wrapper_function.py .                                                                        [ 75%]
tests/test_utils.py ..                                                                                           [ 77%]
tests/test_validation_and_test_split.py ...                                                                      [ 80%]
tests/test_validation_loss_calculator.py ..                                                                      [ 82%]
tests/test_vector_operations.py ...................                                                              [100%]

=================================================== warnings summary ===================================================
../../miniconda3/envs/ms2deepscore/lib/python3.11/site-packages/sparsestack/StackedSparseArray.py:4
  /Users/jberg/miniconda3/envs/ms2deepscore/lib/python3.11/site-packages/sparsestack/StackedSparseArray.py:4: DeprecationWarning: Please use `get_index_dtype` from the `scipy.sparse` namespace, the `scipy.sparse.sputils` namespace is deprecated.
    from scipy.sparse.sputils import get_index_dtype

tests/test_data_generators.py::test_DataGeneratorPytorch
tests/test_data_generators.py::test_generator_initialization
tests/test_data_generators.py::test_batch_generation
tests/test_data_generators.py::test_epoch_end_functionality
tests/test_embedding_evaluator.py::test_train_embedding_evaluator
tests/test_train_ms2deepscore.py::test_train_ms2ds_model
tests/test_train_ms2deepscore.py::test_too_little_spectra
  /Users/jberg/miniconda3/envs/ms2deepscore/lib/python3.11/site-packages/matchms/Metadata.py:124: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
    metadata_filtered = {k:v for k,v in metadata_filtered.items() if v not in invalid_entries}

tests/test_embedding_evaluator.py::test_train_embedding_evaluator
  /Users/jberg/repositories/ms2deepscore-2.0.0/ms2deepscore/models/EmbeddingEvaluatorModel.py:157: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
    evaluations = self(torch.tensor(embeddings).reshape(-1, 1, embedding_dim).to(device, dtype=torch.float32))

tests/test_training_wrapper_function.py::test_train_wrapper_ms2ds_model
  /Users/jberg/repositories/ms2deepscore-2.0.0/ms2deepscore/benchmarking/plot_stacked_histogram.py:95: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). Consider using `matplotlib.pyplot.close()`.
    _, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, nr_of_bins),

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================== 106 passed, 10 warnings in 22.60s ===========================================
j-berg commented 4 months ago

I had to make some modifications to my input msp files and convert the generators to lists, but it seems to be working in my local environment. Screenshot 2024-06-26 at 9 40 23 AM env.txt