griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
141 stars 59 forks source link

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/nfs01/home/p_laplante@intra.igr.fr/mhc_i/src/../method/netmhcpan-4.1-executable/netmhcpan_4_1_executable/netMHCpan' #990

Closed PierreLaplante closed 1 year ago

PierreLaplante commented 1 year ago

Installation Type

Standalone

pVACtools Version / Docker Image

4.0.0

Python Version

3.9.1

Operating System

CentOs 7

Describe the bug

When I launch the following command I obtain this error message at the prediction step :

Making binding predictions on Allele H-2-Dd and Epitope Length 8 with Method NetMHCpanEL - File /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.netmhcpan_el.H-2-Dd.8.tsv_1-200 Traceback (most recent call last): File "/home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py", line 520, in Prediction().main() File "/home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py", line 511, in main self.commandline_input(args) File "/home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py", line 135, in commandline_input mhc_scores = mhc_predictor.predict(input.input_protein.as_amino_acid_text()) File "/mnt/nfs01/home/p_laplante@intra.igr.fr/mhc_i/src/seqpredictor.py", line 903, in predict scores.append(predictor.predict_sequence(sequence,pred)) File "/mnt/nfs01/home/p_laplante@intra.igr.fr/mhc_i/src/seqpredictor.py", line 363, in predict_sequence results = predict_netmhcpan(input_sequence_list, [(allele_name_or_sequence, self.length)], el=True) File "/mnt/nfs01/home/p_laplante@intra.igr.fr/mhc_i/src/../method/netmhcpan-4.1-executable/netmhcpan_4_1_executable/init.py", line 79, in predict_many process = Popen(cmd, stdout=PIPE) File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/subprocess.py", line 947, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/subprocess.py", line 1819, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: '/mnt/nfs01/home/p_laplante@intra.igr.fr/mhc_i/src/../method/netmhcpan-4.1-executable/netmhcpan_4_1_executable/netMHCpan' CRITICAL:pymp:An exception occured in thread 0: (<class 'subprocess.CalledProcessError'>, Command '['/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/python3.9', '/home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py', 'netmhcpan_el', 'H-2-Dd', '8', '/mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-200']' returned non-zero exit status 1.). Traceback (most recent call last): File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/pipeline.py", line 356, in call_iedb pvactools.lib.call_iedb.main(arguments) File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/call_iedb.py", line 44, in main raise err File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/call_iedb.py", line 39, in main (response_text, output_mode) = prediction_class_object.predict(args.input_file, args.allele, args.epitope_length, args.iedb_executable_path, args.iedb_retries, tmp_dir=args.tmp_dir) File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/prediction_class.py", line 58, in predict response = run(arguments, stdout=response_fh, check=True) File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/subprocess.py", line 524, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/python3.9', '/home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py', 'netmhcpan_el', 'H-2-Dd', '8', '/mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-200']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/pvacseq", line 8, in sys.exit(main()) File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/tools/pvacseq/main.py", line 123, in main args[0].func.main(args[1]) File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/tools/pvacseq/run.py", line 138, in main pipeline.execute() File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/pipeline.py", line 451, in execute self.call_iedb(chunks) File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/pipeline.py", line 357, in call_iedb p.print("Making binding predictions on Allele %s and Epitope Length %s with Method %s - File %s - Completed" % (a, epl, method, filename)) File "/home/p_laplante@intra.igr.fr/.local/lib/python3.9/site-packages/pymp/init.py", line 148, in exit raise exc_t(exc_val) TypeError: init() missing 1 required positional argument: 'cmd'

I noticed that in issue #772 someone reported the same error message, but changed subject after that.

Indeed the "missing file" /mnt/nfs01/home/p_laplante@intra.igr.fr/mhc_i/src/../method/netmhcpan-4.1-executable/netmhcpan_4_1_executable/netMHCpan is present when I go look for it.

How to reproduce this bug

pvacseq run \
/mnt/beegfs/scratch/p_laplante/VEP_annotated_vcf/2_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz \
msh2ko_2 \
H-2-Kd,H-2-Dd,H-2-Ld \
NetMHCpanEL \
/mnt/beegfs/scratch/p_laplante/output_pvactools/ \
--iedb-install-directory /home/p_laplante@intra.igr.fr/ \
--net-chop-method cterm \
--netmhc-stab \
--run-reference-proteome-similarity --blastp-path $HOME/ncbi-blast-2.10.1+/bin --blastp-db refseq_select_prot \

Input files

No response

Log output

inputs_log.txt

Output files

No response

susannasiebert commented 1 year ago

This error usually indicates that you ran out of memory, which is pretty common with NetMHCpan. I would suggest having a look at the msh2ko_2.fasta file that should be present in the /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I output directory. Are there any sequences in that file that are very long? If so, you might benefit from reducing the --downstream-sequence-length parameter to a smaller number like 100. Another option to save memory would be to reduce the --fasta-size parameter. If these don't work, is there a different machine you could use with more memory available?

PierreLaplante commented 1 year ago

I have tried running the tool with the maximum memory available, which is 180G, without success. Regarding the fasta file, my sequences are between 21 and 114 amino acids long, with the vast majority at 21. The fasta file itself is 1.55 MB.

What would you recommend ? I'm not sure limiting the --downstream-sequence-length will do anything (unless I didn't understand it fully). What do you mean exactly by reducing --fasta-size? At what number should I reduce it and what will it do?

susannasiebert commented 1 year ago

Hm, that seems like plenty of memory. We batch the peptide fasta into smaller chunks (200 peptides by default). Reducing the --fasta-size will reduce the number of peptides in a chunk, so the prediction call will need to make fewer predictions at once, which reduces the memory needed.

Does it fail with the first prediction call or do any of them succeed before it errors out. I actually wonder if the "@" in your path is causing problems.

Are you able to execute this command directly:

/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/python3.9 /home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py netmhcpan_el H-2-Dd 8 /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-200

Assuming that the /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-200 tmp file still exists?

PierreLaplante commented 1 year ago

(Everything thereafter is run with 180G of ram) Indeed I get the same error message after running

/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/python3.9 /home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py netmhcpan_el H-2-Dd 8 /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-200

Traceback (most recent call last): File "/home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py", line 520, in Prediction().main() File "/home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py", line 511, in main self.commandline_input(args) File "/home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py", line 135, in commandline_input mhc_scores = mhc_predictor.predict(input.input_protein.as_amino_acid_text()) File "/mnt/nfs01/home/p_laplante@intra.igr.fr/mhc_i/src/seqpredictor.py", line 903, in predict scores.append(predictor.predict_sequence(sequence,pred)) File "/mnt/nfs01/home/p_laplante@intra.igr.fr/mhc_i/src/seqpredictor.py", line 363, in predict_sequence results = predict_netmhcpan(input_sequence_list, [(allele_name_or_sequence, self.length)], el=True) File "/mnt/nfs01/home/p_laplante@intra.igr.fr/mhc_i/src/../method/netmhcpan-4.1-executable/netmhcpan_4_1_executable/init.py", line 79, in predict_many process = Popen(cmd, stdout=PIPE) File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/subprocess.py", line 947, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/subprocess.py", line 1819, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: '/mnt/nfs01/home/p_laplante@intra.igr.fr/mhc_i/src/../method/netmhcpan-4.1-executable/netmhcpan_4_1_executable/netMHCpan'

I actually wonder if the "@" in your path is causing problems.

A has already happened with some perl script I tried to run. I moved the files from the home directory to a working directory (without "@") the command looking like this:

pvacseq run \ /mnt/beegfs/scratch/p_laplante/VEP_annotated_vcf/2_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz \ msh2ko_2 \ H-2-Kd,H-2-Dd,H-2-Ld \ NetMHCpanEL \ /mnt/beegfs/scratch/p_laplante/output_pvactools/ \ --iedb-install-directory /mnt/beegfs/scratch/p_laplante/ \ --net-chop-method cterm \ --netmhc-stab \ --run-reference-proteome-similarity --blastp-path /mnt/beegfs/scratch/p_laplante/ncbi-blast-2.14.1+/bin --blastp-db refseq_select_prot \

And I get this error message :

Generating Variant Peptide FASTA and Key Files - Epitope Length 8 - Entries 23401-23452 Generating Variant Peptide FASTA and Key Files - Epitope Length 9 - Entries 23401-23452 Generating Variant Peptide FASTA and Key Files - Epitope Length 10 - Entries 23401-23452 Generating Variant Peptide FASTA and Key Files - Epitope Length 11 - Entries 23401-23452 Completed Making binding predictions on Allele H-2-Dd and Epitope Length 8 with Method NetMHCpanEL - File /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.netmhcpan_el.H-2-Dd.8.tsv_1-200 Traceback (most recent call last): File "/mnt/beegfs/scratch/p_laplante/mhc_i/src/predict_binding.py", line 520, in Prediction().main() File "/mnt/beegfs/scratch/p_laplante/mhc_i/src/predict_binding.py", line 511, in main self.commandline_input(args) File "/mnt/beegfs/scratch/p_laplante/mhc_i/src/predict_binding.py", line 135, in commandline_input mhc_scores = mhc_predictor.predict(input.input_protein.as_amino_acid_text()) File "/mnt/beegfs/scratch/p_laplante/mhc_i/src/seqpredictor.py", line 903, in predict scores.append(predictor.predict_sequence(sequence,pred)) File "/mnt/beegfs/scratch/p_laplante/mhc_i/src/seqpredictor.py", line 363, in predict_sequence results = predict_netmhcpan(input_sequence_list, [(allele_name_or_sequence, self.length)], el=True) File "/mnt/beegfs/scratch/p_laplante/mhc_i/src/../method/netmhcpan-4.1-executable/netmhcpan_4_1_executable/init.py", line 79, in predict_many process = Popen(cmd, stdout=PIPE) File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/subprocess.py", line 947, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/subprocess.py", line 1819, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: '/mnt/beegfs/scratch/p_laplante/mhc_i/src/../method/netmhcpan-4.1-executable/netmhcpan_4_1_executable/netMHCpan' CRITICAL:pymp:An exception occured in thread 0: (<class 'subprocess.CalledProcessError'>, Command '['/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/python3.9', '/mnt/beegfs/scratch/p_laplante/mhc_i/src/predict_binding.py', 'netmhcpan_el', 'H-2-Dd', '8', '/mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-200']' returned non-zero exit status 1.). Traceback (most recent call last): File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/pipeline.py", line 356, in call_iedb pvactools.lib.call_iedb.main(arguments) File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/call_iedb.py", line 44, in main raise err File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/call_iedb.py", line 39, in main (response_text, output_mode) = prediction_class_object.predict(args.input_file, args.allele, args.epitope_length, args.iedb_executable_path, args.iedb_retries, tmp_dir=args.tmp_dir) File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/prediction_class.py", line 58, in predict response = run(arguments, stdout=response_fh, check=True) File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/subprocess.py", line 524, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/python3.9', '/mnt/beegfs/scratch/p_laplante/mhc_i/src/predict_binding.py', 'netmhcpan_el', 'H-2-Dd', '8', '/mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-200']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/pvacseq", line 8, in sys.exit(main()) File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/tools/pvacseq/main.py", line 123, in main args[0].func.main(args[1]) File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/tools/pvacseq/run.py", line 138, in main pipeline.execute() File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/pipeline.py", line 451, in execute self.call_iedb(chunks) File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/pipeline.py", line 357, in call_iedb p.print("Making binding predictions on Allele %s and Epitope Length %s with Method %s - File %s - Completed" % (a, epl, method, filename)) File "/home/p_laplante@intra.igr.fr/.local/lib/python3.9/site-packages/pymp/init.py", line 148, in exit raise exc_t(exc_val) TypeError: init() missing 1 required positional argument: 'cmd'

What do you make of all this ?

By the way, I was wondering why the script is looking for "netMHCpan" in a bit of a convoluted way : "/mnt/beegfs/scratch/p_laplante/mhc_i/src/../method/netmhcpan-4.1-executable/netmhcpan_4_1_executable/netMHCpan" going in to src/ then going up and into method/. Do you think it could somehow play a role?

PierreLaplante commented 1 year ago

I know tried to reinstall miniconda in beegfs/scratch, to remove any mention of "@" and running with --fasta-size 100, to no avail:

Making binding predictions on Allele H-2-Dd and Epitope Length 8 with Method NetMHCpanEL - File /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.netmhcpan_el.H-2-Dd.8.tsv_1-100 Traceback (most recent call last): File "/mnt/beegfs/scratch/p_laplante/mhc_i/src/predict_binding.py", line 520, in Prediction().main() File "/mnt/beegfs/scratch/p_laplante/mhc_i/src/predict_binding.py", line 511, in main self.commandline_input(args) File "/mnt/beegfs/scratch/p_laplante/mhc_i/src/predict_binding.py", line 135, in commandline_input mhc_scores = mhc_predictor.predict(input.input_protein.as_amino_acid_text()) File "/mnt/beegfs/scratch/p_laplante/mhc_i/src/seqpredictor.py", line 903, in predict scores.append(predictor.predict_sequence(sequence,pred)) File "/mnt/beegfs/scratch/p_laplante/mhc_i/src/seqpredictor.py", line 363, in predict_sequence results = predict_netmhcpan(input_sequence_list, [(allele_name_or_sequence, self.length)], el=True) File "/mnt/beegfs/scratch/p_laplante/mhc_i/src/../method/netmhcpan-4.1-executable/netmhcpan_4_1_executable/init.py", line 79, in predict_many process = Popen(cmd, stdout=PIPE) File "/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/lib/python3.10/subprocess.py", line 971, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/lib/python3.10/subprocess.py", line 1847, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: '/mnt/beegfs/scratch/p_laplante/mhc_i/src/../method/netmhcpan-4.1-executable/netmhcpan_4_1_executable/netMHCpan' CRITICAL:pymp:An exception occured in thread 0: (<class 'subprocess.CalledProcessError'>, Command '['/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/bin/python3.10', '/mnt/beegfs/scratch/p_laplante/mhc_i/src/predict_binding.py', 'netmhcpan_el', 'H-2-Dd', '8', '/mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-100']' returned non-zero exit status 1.). Traceback (most recent call last): File "/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/lib/python3.10/site-packages/pvactools/lib/pipeline.py", line 356, in call_iedb pvactools.lib.call_iedb.main(arguments) File "/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/lib/python3.10/site-packages/pvactools/lib/call_iedb.py", line 44, in main raise err File "/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/lib/python3.10/site-packages/pvactools/lib/call_iedb.py", line 39, in main (response_text, output_mode) = prediction_class_object.predict(args.input_file, args.allele, args.epitope_length, args.iedb_executable_path, args.iedb_retries, tmp_dir=args.tmp_dir) File "/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/lib/python3.10/site-packages/pvactools/lib/prediction_class.py", line 58, in predict response = run(arguments, stdout=response_fh, check=True) File "/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/bin/python3.10', '/mnt/beegfs/scratch/p_laplante/mhc_i/src/predict_binding.py', 'netmhcpan_el', 'H-2-Dd', '8', '/mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-100']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/bin/pvacseq", line 8, in sys.exit(main()) File "/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/lib/python3.10/site-packages/pvactools/tools/pvacseq/main.py", line 123, in main args[0].func.main(args[1]) File "/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/lib/python3.10/site-packages/pvactools/tools/pvacseq/run.py", line 138, in main pipeline.execute() File "/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/lib/python3.10/site-packages/pvactools/lib/pipeline.py", line 451, in execute self.call_iedb(chunks) File "/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/lib/python3.10/site-packages/pvactools/lib/pipeline.py", line 348, in call_iedb with pymp.Parallel(self.n_threads) as p: File "/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/lib/python3.10/site-packages/pymp/init.py", line 148, in exit raise exc_t(exc_val) TypeError: CalledProcessError.init() missing 1 required positional argument: 'cmd'

susannasiebert commented 1 year ago

Since this seems to be an installation issue with IEDB and not pVACtools itself, I would recommend opening a ticket with the IEDB help desk directly (help.iedb.org) and mentioning the error you get when running standalone IEDB directly (the /home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/python3.9 /home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py netmhcpan_el H-2-Dd 8 /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-200 command).

You might also want to consider switching to our Docker container to see if that fixes the error.

PierreLaplante commented 1 year ago

I have ended up using the Docker container (through Singularity), which resolved my problem. I tried a run with one sample which apparently succeeded.

The problem I face now is that it seems that not all the input files were generated.

I am missing the .all_epitopes.aggregated.tsv, .all_epitopes.aggregated.metrics.json, ui.R, app.R, server.R, styling.R, anchor_and_helper_functions.R and www (directory).

Would you happen to know why this happened?

Here is a link for a screenshot of my output directory: image

To note that I tried to run NetChop for stability data, and this is what happened:

[W::hts_idx_load3] The index file is older than the data file: /home/p_laplante@intra.igr.fr/VEP_annotated_vcf/5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz.tbi [W::hts_idx_load3] The index file is older than the data file: /home/p_laplante@intra.igr.fr/VEP_annotated_vcf/5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz.tbi [W::hts_idx_load3] The index file is older than the data file: /home/p_laplante@intra.igr.fr/VEP_annotated_vcf/5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz.tbi [W::hts_idx_load3] The index file is older than the data file: /home/p_laplante@intra.igr.fr/VEP_annotated_vcf/5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz.tbi [W::hts_idx_load3] The index file is older than the data file: /home/p_laplante@intra.igr.fr/VEP_annotated_vcf/5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz.tbi Warning: Proximal variant is not a missense mutation and will be skipped: X 113452690 Warning: Proximal variant is not a missense mutation and will be skipped: X 113452690 Warning: Proximal variant is not a missense mutation and will be skipped: X 113452690 Warning: Proximal variant is not a missense mutation and will be skipped: X 113452690 Warning: Proximal variant is not a missense mutation and will be skipped: X 113452690 Warning: Proximal variant is not a missense mutation and will be skipped: X 113452690 Warning: Proximal variant is not a missense mutation and will be skipped: X 126629127 Warning: Proximal variant is not a missense mutation and will be skipped: X 126629127 Completed Generating Variant Peptide FASTA and Key File Completed Parsing the Variant Peptide FASTA and Key File Completed Calculating Manufacturability Metrics Completed Calculating Manufacturability Metrics Completed Running Coverage Filters Completed Running Transcript Support Level Filter Complete Submitting remaining epitopes to NetChop WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))': /cgi-bin/webface2.cgi?jobid=64C41D010000179CE299EC52&wait=20 WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='services.healthtech.dtu.dk', port=443): Read timed out. (read timeout=10)")': /cgi-bin/webface2.cgi?jobid=64C43A6E000054B3C2AFEE55&wait=20 Completed

Done: Pipeline finished successfully. File /home/p_laplante@intra.igr.fr/output_pvactools/MHC_Class_I/msh2ko_2.filtered.tsv contains list of filtered putative neoantigens.

Do you think that the connection error to NetChop ended up terminating the pipeline too early?

I am attaching my inputs.yml (as .txt) for further information.

inputs.txt

PierreLaplante commented 1 year ago

I forgot to add the command I used for this run, which is the following : pvacseq run \ $HOME/VEP_annotated_vcf/5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz \ msh2ko_5 \ H-2-Kd,H-2-Dd,H-2-Ld \ NetMHCpanEL \ $HOME/output_pvactools/ \ --phased-proximal-variants-vcf $HOME/VEP_annotated_vcf/phased/5_Msh2KO.phased.sorted.vcf.mm39.vcf.VEP_anno.vcf.gz \ --iedb-install-directory /opt/iedb \ --net-chop-method cterm \ --run-reference-proteome-similarity --blastp-path $HOME/ncbi-blast-2.14.1+/bin --blastp-db refseq_select_prot \

susannasiebert commented 1 year ago

It looks like maybe NetChop was down or maybe you had a bad internet connection. Unfortunately, it looks like in this situation the run doesn't error out correctly. But this would be the cause for the missing files.

Can you try again and see if you continue running into this problem? Our status checker isn't reporting any issues so it should work. You can just restart your command and it should pick up where it left off as long as all the tmp files are still there.

PierreLaplante commented 1 year ago

Indeed, I have tried several times and it doesn't work past this. I have tried stripping all the options needing to connect to internet, running: pvacseq run $HOME/VEP_annotated_vcf/5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz msh2ko_5 H-2-Kd,H-2-Dd,H-2-Ld NetMHCpanEL $HOME/output_pvactools/ --phased-proximal-variants-vcf $HOME/VEP_annotated_vcf/phased/5_Msh2KO.phased.sorted.vcf.mm39.vcf.VEP_anno.vcf.gz --iedb-install-directory /opt/iedb --run-reference-proteome-similarity --blastp-path $HOME/ncbi-blast-2.14.1+/bin --blastp-db refseq_select_prot \ And the run still finished saying the pipeline completed, but the files are still missing.

susannasiebert commented 1 year ago

Ok, I would like to try and replicate this issue on my end. Would you be able to attach your VCF files (5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz and 5_Msh2KO.phased.sorted.vcf.mm39.vcf.VEP_anno.vcf.gz) incl the tbi index files?

PierreLaplante commented 1 year ago

Here are the files: https://we.tl/t-ei7H4aTmeJ I had to use a wetransfer link, because the phased VCF is too big, and github doesn't allow upload of .tbi.

susannasiebert commented 1 year ago

I apologize for not noticing this sooner but because you are only running with only an elution algorithm, we don't generate the aggregate report and all the other files needed for pVACview because we don't have sufficient information available with elution-only data. So this behavior is expected. If you want to generate this information, you would need to add an additional binding affinity algorithm to your run, such as NetMHCpan.

I will add some information about this to the documentation.

PierreLaplante commented 1 year ago

I see, thank you, it seems to be working now that I added different algorithms. One last question I had, how do you process multiple samples? Because the pipeline doesn't start if the input.yml has been created with sample 1, and you try to run sample 2. Is there a way to still write a loop? Or should I merge the VCFs?

susannasiebert commented 1 year ago

Each pVACseq run is sample-specific. The output directory needs to be specific for the sample as well. You can achieve that by including the sample name in the output directory.

PierreLaplante commented 1 year ago

Ok, makes sense.

I got this error while running the proteome similarity:

Calculating Reference Proteome Similarity Traceback (most recent call last): File "/usr/local/bin/pvacseq", line 8, in sys.exit(main()) ^^^^^^ File "/usr/local/lib/python3.11/site-packages/pvactools/tools/pvacseq/main.py", line 123, in main args[0].func.main(args[1]) File "/usr/local/lib/python3.11/site-packages/pvactools/tools/pvacseq/run.py", line 138, in main pipeline.execute() File "/usr/local/lib/python3.11/site-packages/pvactools/lib/pipeline.py", line 484, in execute PostProcessor(**post_processing_params).execute() File "/usr/local/lib/python3.11/site-packages/pvactools/lib/post_processor.py", line 65, in execute self.calculate_reference_proteome_similarity() File "/usr/local/lib/python3.11/site-packages/pvactools/lib/post_processor.py", line 248, in calculate_reference_proteome_similarity ).execute() ^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 573, in execute unique_peptides = pymp.shared.list(self._get_unique_peptides(mt_records_dict, wt_records_dict)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 554, in _get_unique_peptides peptide, full_peptide = self._get_peptide(line, mt_records_dict, wt_records_dict) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 295, in _get_peptide (full_peptide, wt_peptide, variant_type, mt_amino_acids, wt_amino_acids) = self._get_full_peptide(line, mt_records_dict, wt_records_dict) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pvactools/lib/calculate_reference_proteome_similarity.py", line 278, in _get_full_peptide raise Exception("Unexpected record_id format: {}".format(record_id)) Exception: Unexpected record_id format: 1.Rp1.ENSMUST00000027032.missense.1453N/S

What does it mean?

susannasiebert commented 1 year ago

I think this is an edge case in our parsing logic that we didn't account for, probably related to this being mouse data instead of human. I will make a bugfix for it.

PierreLaplante commented 1 year ago

Is this bug the reason why in the aggregated.tsv I have "Pending" for the "Evaluation" column on every row?

Also, would you be so kind as letting me know when the Docker image will be updated?

Thank you very much for your dedicated help.

susannasiebert commented 1 year ago

No, the Pending is a placeholder to put in your own final evaluation status since usually not all neoantigen candidates can be included in a therapy. You can do that step by loading your results into pVACview where you have the ability to update the Evaluation/Eval column with your final decisions and then export the resulting TSV. If your final goal isn't neoantigen therapy selection you can just ignore this column. The Tier column would give you an indication on our verdict of the neoantigen candidate overall.

Unfortunately, I'm not sure when I a new docker image with this fix will be ready. I will be on vacation starting Thursday. I will try to get it out before then.

susannasiebert commented 1 year ago

This issue should be fixed in pVACtools version 4.0.3. Please give it a try and let me know if you’re running into any other errors.

PierreLaplante commented 1 year ago

After downloading and installing pVACtools 4.0.3, I tried running the tools, and sadly the error still pops up.

I tried running the tool without the option, to look at the results and I ran into a problem with pVACview.

I tried through R using the following :

install.packages("shiny", dependencies=TRUE) install.packages("ggplot2", dependencies=TRUE) install.packages("DT", dependencies=TRUE) install.packages("reshape2", dependencies=TRUE) install.packages("jsonlite", dependencies=TRUE) install.packages("tibble", dependencies=TRUE) install.packages("tidyr", dependencies=TRUE) install.packages("plyr", dependencies=TRUE) install.packages("dplyr", dependencies=TRUE) install.packages("shinydashboard", dependencies=TRUE) install.packages("shinydashboardPlus", dependencies=TRUE) install.packages("fresh", dependencies=TRUE) install.packages("shinycssloaders", dependencies=TRUE) install.packages("RCurl", dependencies=TRUE) install.packages("curl", dependencies=TRUE) install.packages("stringr", dependencies=TRUE) install.packages("shinycssloaders", dependencies=TRUE)

shiny::runApp('C:\Pierre\Projets\MMR_meta\Neo\2\MHC_Class_I', port=3333)

The page loads, I upload the aggregated.tsv, and then when I upload the json, the page closes and I get disconnected.

This is what pops up in the console:

shiny::runApp('C:\Pierre\Projets\MMR_meta\Neo\2\MHC_Class_I', port=3333) Listening on http://127.0.0.1:3333 [1] FALSE [1] FALSE Warning: Error in [.data.frame: colonnes non définies sélectionnées #this pops up when I upload the .json 1: shiny::runApp [1] TRUE

The warning says "Warning: Error in [.data.frame: undefined columns selected" I tried through the webserver, and I get disconnected aswell.

What do you think is the problem?

Here is one of the MHC_Class_I folder containing the necessary files https://we.tl/t-kvfBO6yVfj

susannasiebert commented 1 year ago

Thank you for these error reports. I have bugfixes for both issues in the works and will make a new release next week.

susannasiebert commented 1 year ago

I just released version 4.0.4, which should fix these two issues. Please let me know if you still run into problems.

susannasiebert commented 1 year ago

I'm closing this issue due to inactivity. I assume that the newest version fixed these errors.