Closed yannic-chen closed 5 months ago
Hi @yannic-chen! Thanks for reporting this issue. I could also reproduce it locally with a small testset. That reminds me of including tests for multiple variable or fixed modifications.
There won't be a fix for the deeplc and ms2pip scripts in the next release, since we want to use MS2Rescore to handle all kinds of configurations. So I just merged a PR that removed the deeplc and ms2pip modules, and implemented an ms2rescore module. Could you try using the dev branch with -r dev
? That worked for me. So I'll give you a quick outline how to use it :
You can call e.g. ms2pip and deeplc via a new cli flag --feature_generators deeplc,ms2pip,...
no need for use_deeplc
and use_ms2pip
anymore. Further feature generators you can read up in the ms2rescore doc.
If you only want psm_level_fdr
then you can also specify a new cli flag --rescoring_engine mokapot
instead of percolator (default), to get a handy ms2rescore report in the multiqc
folder. peptide_level_fdrs
are currently only applicable with percolator, since ms2rescore+mokapot currently only reports psm-level-fdr estimates.
Let me know if that works for you as well!
Hi @jonasscheid, Thank you for the quick reply. I have followed your instruction, which led me to the following error:
Pipeline completed with errors
~ Error executing process > 'NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE (WT_A)'
Caused by:
Process NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE (WT_A) terminated with an error exit status (1)
Command executed:
ms2rescore_cli.py \
--psm_file WT_A.idXML \
--spectrum_path . \
--output_path WT_A_ms2rescore.idXML \
--processes 12 \
--ms2_tolerance 0.06 --ms2pip_model Immuno-HCD --rescoring_engine mokapot --feature_generators deeplc,ms2pip
cat <<-END_VERSIONS > versions.yml
"NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE":
MS²Rescore: $(echo $(ms2rescore --version 2>&1) | grep -oP 'MS²Rescore \(v\K[^\)]+' ))
END_VERSIONS
Command exit status:
1
Command output:
Command error:
2024-01-16 13:21:13,284 WARNING Could not add the following atom: Se
2024-01-16 13:21:13,289 WARNING Skipping the following (not in library): 4 ('U', None)
2024-01-16 13:21:13,335 WARNING Could not add the following value: pos 1 for atom Se with value 1
2024-01-16 13:21:13,335 WARNING Could not add the following atom: Se
2024-01-16 13:21:13,335 WARNING Skipping the following (not in library): 1 ('U', None)
2024-01-16 13:21:20,487 WARNING Could not add the following value: pos 4 for atom Se with value 1
2024-01-16 13:21:20,487 WARNING Could not add the following atom: Se
2024-01-16 13:21:20,491 WARNING Skipping the following (not in library): 4 ('U', None)
2024-01-16 13:21:32,355 WARNING Could not add the following value: pos 4 for atom Se with value 1
2024-01-16 13:21:32,355 WARNING Could not add the following atom: Se
2024-01-16 13:21:32,355 WARNING Skipping the following (not in library): 4 ('U', None)
2024-01-16 13:24:14,538 WARNING Removed 105 PSMs that were missing one or more rescoring feature(s), {'abs_diff_Q1_norm', 'iony_abs_diff_Q1', 'ionb_spearman', 'dotprod_ionb_norm', 'spec_pearson', 'iony_min_abs_diff_norm', 'spec_pearson_norm', 'dotprod_iony', 'ionb_max_abs_diff_norm', 'ionb_abs_diff_Q3_norm', 'ionb_abs_diff_Q1', 'iony_mean_abs_diff_norm', 'iony_mse', 'spec_spearman', 'iony_max_abs_diff_norm', 'iony_abs_diff_Q2', 'min_abs_diff', 'ionb_pearson_norm', 'ionb_abs_diff_Q1_norm', 'ionb_mse_norm', 'iony_abs_diff_Q3_norm', 'spec_mse_norm', 'ionb_mean_abs_diff', 'dotprod_ionb', 'iony_abs_diff_Q3', 'cos_iony', 'cos_norm', 'mean_abs_diff_norm', 'min_abs_diff_iontype', 'min_abs_diff_norm', 'ionb_min_abs_diff_norm', 'ionb_abs_diff_Q2', 'cos', 'abs_diff_Q3_norm', 'abs_diff_Q1', 'iony_std_abs_diff', 'ionb_mean_abs_diff_norm', 'iony_spearman', 'cos_iony_norm', 'ionb_abs_diff_Q2_norm', 'iony_std_abs_diff_norm', 'cos_ionb', 'ionb_abs_diff_Q3', 'spec_mse', 'std_abs_diff', 'ionb_std_abs_diff', 'ionb_mse', 'abs_diff_Q3', 'max_abs_diff_iontype', 'iony_mean_abs_diff', 'max_abs_diff_norm', 'iony_pearson_norm', 'iony_min_abs_diff', 'ionb_pearson', 'abs_diff_Q2', 'mean_abs_diff', 'ionb_max_abs_diff', 'dotprod', 'dotprod_norm', 'iony_pearson', 'abs_diff_Q2_norm', 'iony_abs_diff_Q2_norm', 'cos_ionb_norm', 'max_abs_diff', 'std_abs_diff_norm', 'iony_mse_norm', 'iony_abs_diff_Q1_norm', 'ionb_std_abs_diff_norm', 'dotprod_iony_norm', 'ionb_min_abs_diff', 'iony_max_abs_diff'}.
2024-01-16 13:34:38,542 INFO Identified 99077 (187.10%) more PSMs at 1% FDR after rescoring.
2024-01-16 13:34:38,542 INFO Writing output to WT_A_ms2rescore.idXML.psms.tsv...
2024-01-16 13:36:12,847 INFO Collecting files...
2024-01-16 13:36:12,847 INFO ✅ Found PSMs: 'WT_A_ms2rescore.idXML.psms.tsv'
2024-01-16 13:36:12,847 INFO ✅ Found configuration: 'WT_A_ms2rescore.idXML.full-config.json'
2024-01-16 13:36:12,847 INFO ✅ Found feature names: 'WT_A_ms2rescore.idXML.feature_names.tsv'
2024-01-16 13:36:12,848 INFO ✅ Found feature weights: 'WT_A_ms2rescore.idXML.mokapot.weights.tsv'
2024-01-16 13:36:12,848 WARNING ❌ log: 'WT_A_ms2rescore.idXML.log.txt'
2024-01-16 13:36:12,848 INFO Recalculating confidence estimates...
Traceback (most recent call last):
File "/root/.nextflow/assets/nf-core/mhcquant/bin/ms2rescore_cli.py", line 175, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/root/.nextflow/assets/nf-core/mhcquant/bin/ms2rescore_cli.py", line 171, in main
rescore_idxml(kwargs["psm_file"], kwargs["output_path"], config)
File "/root/.nextflow/assets/nf-core/mhcquant/bin/ms2rescore_cli.py", line 81, in rescore_idxml
rescore(config, psm_list)
File "/usr/local/lib/python3.10/site-packages/ms2rescore/core.py", line 151, in rescore
generate.generate_report(
File "/usr/local/lib/python3.10/site-packages/ms2rescore/report/generate.py", line 87, in generate_report
confidence_before, confidence_after = get_confidence_estimates(psm_list, fasta_file)
File "/usr/local/lib/python3.10/site-packages/ms2rescore/report/utils.py", line 40, in get_confidence_estimates
score_before = pd.DataFrame.from_records(psm_list["provenance_data"])[
File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 2335, in from_records
arrays, columns = to_arrays(data, columns)
File "/usr/local/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 867, in to_arrays
arr, columns = _list_of_dict_to_arrays(data, columns)
File "/usr/local/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 954, in _list_of_dict_to_arrays
content = lib.dicts_to_array(data, list(columns))
File "pandas/_libs/lib.pyx", line 447, in pandas._libs.lib.dicts_to_array
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 4.42 TiB for an array with shape (964370, 629314) and data type object
Work dir:
/home/chyannic/YC/test_mhcquant/work/02/34682d0057994344f573816d3feed8
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
-- Check '.nextflow.log' file for details
I used the following command:
sudo nextflow run nf-core/mhcquant --input samplesheet_HLA1.tsv --outdir OUTDIR_HLA1_PTM_featureGenerators --fasta human_20365_conts_validated_conversion.fasta -profile docker -r dev --feature_generators deeplc,ms2pip --rescoring_engine mokapot --digest_mass_range '500:2500' --activation_method CID --prec_charge '1:4' --fdr_threshold 0.01 --fdr_level psm_level_fdrs --number_mods 3 --precursor_mass_tolerance 15 --fragment_mass_tolerance 0.03 --num_hits 1 --peptide_min_length 7 --peptide_max_length 25 --variable_mods 'Oxidation (M),Acetyl (Protein N-term),Cysteinyl (C)' -resume
The --feature_generators
for ms2pip is by default "ms2pip": {"model": "HCD", "ms2_tolerance": 0.02}
. How could I change the model to, for example "CID"
?
Thank you for the good work.
Unable to allocate 4.42 TiB for an array with shape (964370, 629314) and data type object
That is a big input file you have right there! Looks like you ran out of memory. By default the MS2RESCORE
module uses process_high
, see specs here. Thats sufficient for most requests. If you need to allocate more, you can provide a custom config file that can be given via -c <name>.config
containing e.g.
process { withName: 'MS2RESCORE'{ memory = 5500.GB } }
Fingers crossed that your setup has these specs
The --feature_generators for ms2pip is by default "ms2pip": {"model": "HCD", "ms2_tolerance": 0.02}. How could I change the model to, for example "CID"?
That is default ms2rescore, but in mhcquant the default is overwritten to Immuno-HCD
, see your error message:
--ms2_tolerance 0.06 --ms2pip_model Immuno-HCD --rescoring_engine mokapot --feature_generators deeplc,ms2pip
You can use the model names of ms2pip documented on their github repo
Let me know if that helped. You can also use our Slack channel if you want to discuss in more depth.
Thanks for the quick response.
The input were 12 files which are less than 40GB in total.
I have redone the analysis now with only 2 input files, which now gives me another error.
executor > local (8) [3d/1a427d] process > NFCORE_MHCQUANT:MHCQUANT:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_HLA1_single.tsv) [100%] 1 of 1 ✔ [33/ebe7a0] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_DECOYDATABASE (human_20365_conts_validated_conversion) [100%] 1 of 1 ✔ [- ] process > NFCORE_MHCQUANT:MHCQUANT:THERMORAWFILEPARSER - [ff/7ea3d6] process > NFCORE_MHCQUANT:MHCQUANT:TDF2MZML (1) [100%] 2 of 2, cached: 2 ✔ [a5/fa2671] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_COMETADAPTER (2) [100%] 2 of 2 ✔ [7d/58b0df] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_PEPTIDEINDEXER (2) [100%] 2 of 2 ✔ [c7/3b6599] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_IDMERGER (WT_A) [100%] 1 of 1 ✔ [43/3d8aab] process > NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE (WT_A) [100%] 1 of 1, failed: 1 ✘ [- ] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_IDSCORESWITCHER - [- ] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_IDFILTER_Q_VALUE - [- ] process > NFCORE_MHCQUANT:MHCQUANT:QUANT:OPENMS_IDRIPPER - [- ] process > NFCORE_MHCQUANT:MHCQUANT:QUANT:PYOPENMS_IDFILTER - [- ] process > NFCORE_MHCQUANT:MHCQUANT:QUANT:MAP_ALIGNMENT:OPENMS_MAPALIGNERIDENTIFICATION - [- ] process > NFCORE_MHCQUANT:MHCQUANT:QUANT:MAP_ALIGNMENT:OPENMS_MAPRTTRANSFORMERMZML - [- ] process > NFCORE_MHCQUANT:MHCQUANT:QUANT:MAP_ALIGNMENT:OPENMS_MAPRTTRANSFORMERIDXML - [- ] process > NFCORE_MHCQUANT:MHCQUANT:QUANT:OPENMS_IDMERGER_QUANT - [- ] process > NFCORE_MHCQUANT:MHCQUANT:QUANT:PROCESS_FEATURE:OPENMS_FEATUREFINDERIDENTIFICATION - [- ] process > NFCORE_MHCQUANT:MHCQUANT:QUANT:PROCESS_FEATURE:OPENMS_FEATURELINKERUNLABELEDKD - [- ] process > NFCORE_MHCQUANT:MHCQUANT:QUANT:PROCESS_FEATURE:OPENMS_IDCONFLICTRESOLVER - [- ] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_TEXTEXPORTER - [- ] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_MZTABEXPORTER - [- ] process > NFCORE_MHCQUANT:MHCQUANT:CUSTOM_DUMPSOFTWAREVERSIONS - [- ] process > NFCORE_MHCQUANT:MHCQUANT:MULTIQC - Execution cancelled -- Finishing pending tasks before exit
-[nf-core/mhcquant] Pipeline completed with errors- ERROR ~ Error executing process > 'NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE (WT_A)' Caused by: Process `NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE (WT_A)` terminated with an error exit status (1) Command executed: ms2rescore_cli.py \ --psm_file WT_A.idXML \ --spectrum_path . \ --output_path WT_A_ms2rescore.idXML \ --processes 12 \ --ms2_tolerance 0.06 --ms2pip_model CID --rescoring_engine mokapot --feature_generators deeplc,ms2pip cat <<-END_VERSIONS > versions.yml "NFCORE_MHCQUANT:MHCQUANT:MS2RESCORE": MS²Rescore: $(echo $(ms2rescore --version 2>&1) | grep -oP 'MS²Rescore \(v\K[^\)]+' )) END_VERSIONS Command exit status: 1 Command output: (empty) Command error: sys.exit(main()) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/root/.nextflow/assets/nf-core/mhcquant/bin/ms2rescore_cli.py", line 171, in main rescore_idxml(kwargs["psm_file"], kwargs["output_path"], config) File "/root/.nextflow/assets/nf-core/mhcquant/bin/ms2rescore_cli.py", line 81, in rescore_idxml
rescore(config, psm_list) [0/1626] File "/usr/local/lib/python3.10/site-packages/ms2rescore/core.py", line 76, in rescore
fgen.add_features(psm_list)
File "/usr/local/lib/python3.10/site-packages/ms2rescore/feature_generators/ms2pip.py", line 190, in add_features
ms2pip_results = correlate(
File "/usr/local/lib/python3.10/site-packages/ms2pip/core.py", line 178, in correlate
ms2pip_parallelized = _Parallelized(
File "/usr/local/lib/python3.10/site-packages/ms2pip/core.py", line 331, in __init__
validate_requested_xgb_model(
File "/usr/local/lib/python3.10/site-packages/ms2pip/_utils/xgb_models.py", line 21, in validate_requested_xgb_model
_download_model(model_file, xgboost_model_hashes[model_file], model_dir)
File "/usr/local/lib/python3.10/site-packages/ms2pip/_utils/xgb_models.py", line 98, in _download_model
urllib.request.urlretrieve(
File "/usr/local/lib/python3.10/urllib/request.py", line 241, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/usr/local/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.10/urllib/request.py", line 525, in open
response = meth(req, response)
File "/usr/local/lib/python3.10/urllib/request.py", line 634, in http_response
response = self.parent.error(
File "/usr/local/lib/python3.10/urllib/request.py", line 557, in error
rescore(config, psm_list) [0/1626] File "/usr/local/lib/python3.10/site-packages/ms2rescore/core.py", line 76, in rescore
fgen.add_features(psm_list)
File "/usr/local/lib/python3.10/site-packages/ms2rescore/feature_generators/ms2pip.py", line 190, in add_features
ms2pip_results = correlate(
File "/usr/local/lib/python3.10/site-packages/ms2pip/core.py", line 178, in correlate
ms2pip_parallelized = _Parallelized(
File "/usr/local/lib/python3.10/site-packages/ms2pip/core.py", line 331, in __init__
validate_requested_xgb_model(
File "/usr/local/lib/python3.10/site-packages/ms2pip/_utils/xgb_models.py", line 21, in validate_requested_xgb_model
_download_model(model_file, xgboost_model_hashes[model_file], model_dir)
File "/usr/local/lib/python3.10/site-packages/ms2pip/_utils/xgb_models.py", line 98, in _download_model
urllib.request.urlretrieve(
File "/usr/local/lib/python3.10/urllib/request.py", line 241, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/usr/local/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.10/urllib/request.py", line 525, in open
response = meth(req, response)
File "/usr/local/lib/python3.10/urllib/request.py", line 634, in http_response
response = self.parent.error(
File "/usr/local/lib/python3.10/urllib/request.py", line 557, in error
result = self._call_chain(*args)
File "/usr/local/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args) File "/usr/local/lib/python3.10/urllib/request.py", line 749, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/local/lib/python3.10/urllib/request.py", line 525, in open
response = meth(req, response) File "/usr/local/lib/python3.10/urllib/request.py", line 634, in http_response
response = self.parent.error(
File "/usr/local/lib/python3.10/urllib/request.py", line 563, in error
return self._call_chain(*args)
File "/usr/local/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.10/urllib/request.py", line 643, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Work dir:
/home/chyannic/YC/test_mhcquant/work/43/3d8aab086da85dc731af723bc436e3
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
-- Check '.nextflow.log' file for details
Is it expected to require around 100x the total filesize in RAM for ms2rescore?
PS: it seems like the Slack channel you linked require specific e-mail address extension to work.
urllib.error.HTTPError: HTTP Error 403: Forbidden
Do you run offline?
Is it expected to require around 100x the total filesize in RAM for ms2rescore?
Normally not. Do you mind sharing your little test dataset such that I can run some tests on it? is
PS: Apologies, maybe that one helps https://nf-co.re/join
EDIT: Can reproduce your error. Is a docker-related issue. I'll update you as soon as I have a fix
Fixed in the 2.6.0dev
version by #302
Description of the bug
I have problems using MHCquant with multiple variable modifications.
The whole pipeline works when not specifying any modifciations. With multiple modifcations, the pipeline only works when not including DEEPLC and MS2PIP.
Command used and terminal output
Relevant files
nextflow.log
System information
Freshly installed everything last week, so should be the newest version