openvax / mhcflurry

Peptide-MHC I binding affinity prediction
http://openvax.github.io/mhcflurry/
Apache License 2.0
191 stars 57 forks source link

"mhcflurry-downloads fetch" under Windows does not work as expected #127

Closed saskra closed 5 years ago

saskra commented 6 years ago

I am testing the workflow from the README.md, but it gets stuck at the third command although installation and download worked well. (btw: No difference between Windows and Linux.) Here is what I get:

C:\>activate mhcflurry

(mhcflurry) C:\>pip install mhcflurry
Requirement already satisfied: mhcflurry in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (1.0.0)
Requirement already satisfied: tensorflow>=1.1.0 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from mhcflurry) (1.8.0)
Requirement already satisfied: pyyaml in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from mhcflurry) (3.12)
Requirement already satisfied: six in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from mhcflurry) (1.11.0)
Requirement already satisfied: appdirs in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from mhcflurry) (1.4.3)
Requirement already satisfied: Keras>=2.0.9 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from mhcflurry) (2.2.0)
Requirement already satisfied: scikit-learn in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from mhcflurry) (0.19.1)
Requirement already satisfied: pandas>=0.20.3 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from mhcflurry) (0.23.1)
Requirement already satisfied: numpy>=1.11 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from mhcflurry) (1.15.0rc2)
Requirement already satisfied: mhcnames in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from mhcflurry) (0.4.8)
Requirement already satisfied: absl-py>=0.1.6 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from tensorflow>=1.1.0->mhcflurry) (0.2.2)
Requirement already satisfied: protobuf>=3.4.0 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from tensorflow>=1.1.0->mhcflurry) (3.6.0)
Requirement already satisfied: tensorboard<1.9.0,>=1.8.0 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from tensorflow>=1.1.0->mhcflurry) (1.8.0)
Requirement already satisfied: termcolor>=1.1.0 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from tensorflow>=1.1.0->mhcflurry) (1.1.0)
Requirement already satisfied: wheel>=0.26 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from tensorflow>=1.1.0->mhcflurry) (0.31.1)
Requirement already satisfied: astor>=0.6.0 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from tensorflow>=1.1.0->mhcflurry) (0.7.1)
Requirement already satisfied: gast>=0.2.0 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from tensorflow>=1.1.0->mhcflurry) (0.2.0)
Requirement already satisfied: grpcio>=1.8.6 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from tensorflow>=1.1.0->mhcflurry) (1.13.0)
Requirement already satisfied: scipy>=0.14 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from Keras>=2.0.9->mhcflurry) (1.1.0)
Requirement already satisfied: h5py in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from Keras>=2.0.9->mhcflurry) (2.8.0)
Requirement already satisfied: keras_applications==1.0.2 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from Keras>=2.0.9->mhcflurry) (1.0.2)
Requirement already satisfied: keras_preprocessing==1.0.1 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from Keras>=2.0.9->mhcflurry) (1.0.1)
Requirement already satisfied: pytz>=2011k in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from pandas>=0.20.3->mhcflurry) (2018.5)
Requirement already satisfied: python-dateutil>=2.5.0 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from pandas>=0.20.3->mhcflurry) (2.7.3)
Requirement already satisfied: setuptools in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from protobuf>=3.4.0->tensorflow>=1.1.0->mhcflurry) (40.0.0)
Requirement already satisfied: html5lib==0.9999999 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow>=1.1.0->mhcflurry) (0.9999999)
Requirement already satisfied: bleach==1.5.0 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow>=1.1.0->mhcflurry) (1.5.0)
Requirement already satisfied: werkzeug>=0.11.10 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow>=1.1.0->mhcflurry) (0.14.1)
Requirement already satisfied: markdown>=2.6.8 in c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow>=1.1.0->mhcflurry) (2.6.11)

(mhcflurry) C:\>mhcflurry-downloads fetch
Fetching 0/10 downloads from release 1.2.0
DOWNLOAD NAME                             ALREADY DOWNLOADED?    WILL DOWNLOAD NOW?    URL
models_class1                             YES                    NO                    https://github.com/openvax/mhcflurry/releases/download/pre-1.2/models_class1.20180225.tar.bz2
models_class1_selected_no_mass_spec       YES                    NO                    https://github.com/openvax/mhcflurry/releases/download/pre-1.2/models_class1_selected_no_mass_spec.20180225.tar.bz2
models_class1_unselected                  NO                     NO                    https://github.com/openvax/mhcflurry/releases/download/pre-1.2/models_class1_unselected.20180221.tar.bz2
models_class1_trained_with_mass_spec      YES                    NO                    https://github.com/openvax/mhcflurry/releases/download/pre-1.2.1/models_class1_trained_with_mass_spec.20180228.tar.bz2
models_class1_unselected_with_mass_spec   NO                     NO                    https://github.com/openvax/mhcflurry/releases/download/pre-1.2.1/models_class1_unselected_with_mass_spec.20180227.tar.bz2
models_class1_minimal                     NO                     NO                    https://github.com/openvax/mhcflurry/releases/download/pre-1.2/models_class1_minimal.20180226.tar.bz2
data_iedb                                 NO                     NO                    https://github.com/openvax/mhcflurry/releases/download/pre-1.0/data_iedb.tar.bz2
data_published                            NO                     NO                    http://github.com/openvax/mhcflurry/releases/download/pre-1.1/data_published.tar.bz2
data_systemhcatlas                        NO                     NO                    http://github.com/openvax/mhcflurry/releases/download/pre-1.1/data_systemhcatlas.tar.bz2
data_curated                              YES                    NO                    https://github.com/openvax/mhcflurry/releases/download/pre-1.2/data_curated.20180219.tar.bz2

(mhcflurry) C:\>mhcflurry-predict --alleles HLA-A0201 HLA-A0301 --peptides SIINFEKL SIINFEKD SIINFEKQ --out /tmp/predictions.csv
Traceback (most recent call last):
  File "C:\Users\XXX\AppData\Local\Continuum\anaconda3\envs\mhcflurry\Scripts\mhcflurry-predict-script.py", line 11, in <module> load_entry_point('mhcflurry==1.2.2', 'console_scripts', 'mhcflurry-predict')()
  File "c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages\mhcflurry-1.2.2-py3.6.egg\mhcflurry\predict_command.py", line 148, in run
  File "c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages\mhcflurry-1.2.2-py3.6.egg\mhcflurry\class1_affinity_predictor.py", line 396, in load
  File "c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f return _read(filepath_or_buffer, kwds)
  File "c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages\pandas\io\parsers.py", line 440, in _read parser = TextFileReader(filepath_or_buffer, **kwds)
  File "c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages\pandas\io\parsers.py", line 787, in __init__ self._make_engine(self.engine)
  File "c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages\pandas\io\parsers.py", line 1014, in _make_engine self._engine = CParserWrapper(self.f, **self.options)
  File "c:\users\XXX\appdata\local\continuum\anaconda3\envs\mhcflurry\lib\site-packages\pandas\io\parsers.py", line 1708, in __init__ self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 384, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b'C:\\Users\\XXX\\AppData\\Local\\mhcflurry\\mhcflurry\\4\\1.2.0\\models_class1\\models\\manifest.csv' does not exist

(mhcflurry) C:\>

The error message is "true" - the directory exists, but the file does not. But where should it come from: Was it part of the download, or should it have been created during the prediction?

timodonnell commented 6 years ago

It should be part of the download. It's a file in the tarball that is downloaded and unpacked. What are the contents of the directory? Maybe there's an issue with how we're unpacking the file on Windows.

(btw: No difference between Windows and Linux.)

Are you hitting this issue on Linux as well?

saskra commented 6 years ago

What are the contents of the directory?

The directory "C:\Users\XXX\AppData\Local\mhcflurry\mhcflurry\4\1.2.0\models_class1\models" is empty.

Are you hitting this issue on Linux as well?

Yes, but the download was done via Windows as my Linux is currently offline.

Maybe there's an issue with how we're unpacking the file on Windows.

This might actually be the problem - manually downloading and unpacking the file helped. At least it lead further to the next error being:

(mhcflurry) C:\>mhcflurry-predict --alleles HLA-A0201 HLA-A0301 --peptides SIINFEKL SIINFEKD SIINFEKQ --out /tmp/predictions.csv
Traceback (most recent call last):
  File "XXX\Scripts\mhcflurry-predict-script.py", line 11, in <module> load_entry_point('mhcflurry==1.2.2', 'console_scripts', 'mhcflurry-predict')()
  File "XXX\lib\site-packages\mhcflurry-1.2.2-py3.6.egg\mhcflurry\predict_command.py", line 206, in run
  File "XXX\lib\site-packages\mhcflurry-1.2.2-py3.6.egg\mhcflurry\class1_affinity_predictor.py", line 996, in predict_to_dataframe
  File "XXX\lib\site-packages\mhcflurry-1.2.2-py3.6.egg\mhcflurry\class1_neural_network.py", line 774, in predict
  File "XXX\lib\site-packages\mhcflurry-1.2.2-py3.6.egg\mhcflurry\class1_neural_network.py", line 239, in network
  File "XXX\lib\site-packages\mhcflurry-1.2.2-py3.6.egg\mhcflurry\class1_neural_network.py", line 321, in load_weights
  File "XXX\lib\site-packages\mhcflurry-1.2.2-py3.6.egg\mhcflurry\class1_affinity_predictor.py", line 1066, in load_weights
  File "XXX\lib\site-packages\numpy\lib\npyio.py", line 384, in load fid = open(file, "rb")
OSError: [Errno 22] Invalid argument: 'C:\\Users\\XXX\\AppData\\Local\\mhcflurry\\mhcflurry\\4\\1.2.0\\models_class1\\models\\weights_HLA-A*02:01-0-bff74107e39ddcc1.npz'
(mhcflurry) C:\

(Same on Linux.)

Actually, in this directory there is a file called "weights_HLA-A_02_01-0-bff74107e39ddcc1.npz" which seems to be meant. How can I make mhcflurry recognize that one? (Shall I rename the issue or open a new one?)

timodonnell commented 6 years ago

Ah I guess unpacking on Windows may have removed the special characters like * and : from the filenames? That is kind of bad on our part to have characters like that in the filenames. We can fix that in the next models release but that may not be for some time. For now one thing you can try is modifying manifest.csv to remove the '*' and ':' characters from the model names, so they match the file names.

One word of warning though: as you can probably tell, we don't test on windows. Happy to help you when possible but I don't have a good way of testing on windows so you may keep hitting various issues. Docker may be a better way to go.

saskra commented 6 years ago

we don't test on windows

I only wanted to use Windows for the download part due to offline Linux. Unfortunately, downloading seems to be linked with unpacking and this does not work as expected on Windows. Manually downloading on Windows and manually unpacking on Linux is inconvenient - but it works...

saskra commented 6 years ago

Is there a way to use the download script for "download only"? It even crashes when trying to unpack.

Extracting:   0%|                                                                                                                                                                 | 0/1659 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "XXX\Scripts\mhcflurry-downloads-script.py", line 11, in <module> load_entry_point('mhcflurry==1.2.2', 'console_scripts', 'mhcflurry-downloads')()
  File "XXX\lib\site-packages\mhcflurry-1.2.2-py3.6.egg\mhcflurry\downloads_command.py", line 106, in run
  File "XXX\lib\site-packages\mhcflurry-1.2.2-py3.6.egg\mhcflurry\downloads_command.py", line 222, in fetch_subcommand
  File "XXX\lib\tarfile.py", line 2008, in extractall numeric_owner=numeric_owner)
  File "XXX\lib\tarfile.py", line 2050, in extract numeric_owner=numeric_owner)
  File "XXX\lib\tarfile.py", line 2120, in _extract_member self.makefile(tarinfo, targetpath)
  File "XXX\lib\tarfile.py", line 2161, in makefile with bltn_open(targetpath, "wb") as target:
OSError: [Errno 22] Invalid argument: 'C:\\Users\\XXX\\AppData\\Local\\mhcflurry\\mhcflurry\\4\\1.2.0\\models_class1_selected_no_mass_spec\\models\\weights_HLA-A*26:01-11-a27e2cd3e6978963.npz'
timodonnell commented 5 years ago

Sorry we we couldn't help with this. I don't have a windows machine to debug these issues on but we'd certainly consider PRs to improve windows compatibility