nf-core / mhcquant

Identify and quantify MHC eluted peptides from mass spectrometry raw data
https://nf-co.re/mhcquant
MIT License
33 stars 25 forks source link

Add cache management for `ms2rescore` #317

Closed MajoroMask closed 5 months ago

MajoroMask commented 7 months ago

Description of feature

I'm running the dev branch of mhcquant under the test profile on my local server:

git clone --branch dev --single-branch git@github.com:nf-core/mhcquant.git mhcquant
nextflow run ./main.nf -profile test,docker --outdir test

And keep running into the same error:

Apr-25 13:01:55.598 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE (HepG2_1)'

Caused by:
  Process `NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE (HepG2_1)` terminated with an error exit status (1)

Command executed:

  ms2rescore_cli.py \
      --psm_file HepG2_1.idXML \
      --spectrum_path . \
      --output_path HepG2_1_ms2rescore.idXML \
      --processes 2 \
      --ms2_tolerance 0.02 --ms2pip_model Immuno-HCD --rescoring_engine percolator --feature_generators deeplc,ms2pip

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE":
      MS²Rescore: $(echo $(ms2rescore --version 2>&1) | grep -oP 'MS²Rescore \(v\K[^\)]+' ))
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
    File "/usr/local/lib/python3.10/socket.py", line 824, in create_connection
      for res in getaddrinfo(host, port, 0, SOCK_STREAM):
    File "/usr/local/lib/python3.10/socket.py", line 955, in getaddrinfo
      for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
  socket.gaierror: [Errno -3] Temporary failure in name resolution

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/data2023/suna/proj/mhcquant/bin/ms2rescore_cli.py", line 175, in <module>
      sys.exit(main())
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
    File "/data2023/suna/proj/mhcquant/bin/ms2rescore_cli.py", line 171, in main
      rescore_idxml(kwargs["psm_file"], kwargs["output_path"], config)
    File "/data2023/suna/proj/mhcquant/bin/ms2rescore_cli.py", line 81, in rescore_idxml
      rescore(config, psm_list)
    File "/usr/local/lib/python3.10/site-packages/ms2rescore/core.py", line 80, in rescore
      fgen.add_features(psm_list)
    File "/usr/local/lib/python3.10/site-packages/ms2rescore/feature_generators/ms2pip.py", line 190, in add_features
      ms2pip_results = correlate(
    File "/usr/local/lib/python3.10/site-packages/ms2pip/core.py", line 178, in correlate
      ms2pip_parallelized = _Parallelized(
    File "/usr/local/lib/python3.10/site-packages/ms2pip/core.py", line 383, in __init__
      validate_requested_xgb_model(
    File "/usr/local/lib/python3.10/site-packages/ms2pip/_utils/xgb_models.py", line 21, in validate_requested_xgb_model
      _download_model(model_file, xgboost_model_hashes[model_file], model_dir)
    File "/usr/local/lib/python3.10/site-packages/ms2pip/_utils/xgb_models.py", line 98, in _download_model
      urllib.request.urlretrieve(
    File "/usr/local/lib/python3.10/urllib/request.py", line 241, in urlretrieve
      with contextlib.closing(urlopen(url, data)) as fp:
    File "/usr/local/lib/python3.10/urllib/request.py", line 216, in urlopen
      return opener.open(url, data, timeout)
    File "/usr/local/lib/python3.10/urllib/request.py", line 519, in open
      response = self._open(req, data)
    File "/usr/local/lib/python3.10/urllib/request.py", line 536, in _open
      result = self._call_chain(self.handle_open, protocol, protocol +
    File "/usr/local/lib/python3.10/urllib/request.py", line 496, in _call_chain
      result = func(*args)
    File "/usr/local/lib/python3.10/urllib/request.py", line 1377, in http_open
      return self.do_open(http.client.HTTPConnection, req)
    File "/usr/local/lib/python3.10/urllib/request.py", line 1351, in do_open
      raise URLError(err)
  urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

In my case, when pipeline module calls ms2rescore_cli.py script, the ms2rescore.rescore is trying to download a xgboost file (in this case its size is 300MB) into the docker container, which cause error when the network condition is not that good.

My request is can we have an option for ms2rescore to use a preparable cache?