Closed PierreLaplante closed 1 year ago
This error usually indicates that you ran out of memory, which is pretty common with NetMHCpan. I would suggest having a look at the msh2ko_2.fasta file that should be present in the /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I output directory. Are there any sequences in that file that are very long? If so, you might benefit from reducing the --downstream-sequence-length parameter to a smaller number like 100. Another option to save memory would be to reduce the --fasta-size parameter. If these don't work, is there a different machine you could use with more memory available?
I have tried running the tool with the maximum memory available, which is 180G, without success. Regarding the fasta file, my sequences are between 21 and 114 amino acids long, with the vast majority at 21. The fasta file itself is 1.55 MB.
What would you recommend ? I'm not sure limiting the --downstream-sequence-length will do anything (unless I didn't understand it fully). What do you mean exactly by reducing --fasta-size? At what number should I reduce it and what will it do?
Hm, that seems like plenty of memory. We batch the peptide fasta into smaller chunks (200 peptides by default). Reducing the --fasta-size will reduce the number of peptides in a chunk, so the prediction call will need to make fewer predictions at once, which reduces the memory needed.
Does it fail with the first prediction call or do any of them succeed before it errors out. I actually wonder if the "@" in your path is causing problems.
Are you able to execute this command directly:
/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/python3.9 /home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py netmhcpan_el H-2-Dd 8 /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-200
Assuming that the /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-200
tmp file still exists?
(Everything thereafter is run with 180G of ram) Indeed I get the same error message after running
/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/python3.9 /home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py netmhcpan_el H-2-Dd 8 /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-200
Traceback (most recent call last):
File "/home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py", line 520, in
I actually wonder if the "@" in your path is causing problems.
A has already happened with some perl script I tried to run. I moved the files from the home directory to a working directory (without "@") the command looking like this:
pvacseq run \ /mnt/beegfs/scratch/p_laplante/VEP_annotated_vcf/2_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz \ msh2ko_2 \ H-2-Kd,H-2-Dd,H-2-Ld \ NetMHCpanEL \ /mnt/beegfs/scratch/p_laplante/output_pvactools/ \ --iedb-install-directory /mnt/beegfs/scratch/p_laplante/ \ --net-chop-method cterm \ --netmhc-stab \ --run-reference-proteome-similarity --blastp-path /mnt/beegfs/scratch/p_laplante/ncbi-blast-2.14.1+/bin --blastp-db refseq_select_prot \
And I get this error message :
Generating Variant Peptide FASTA and Key Files - Epitope Length 8 - Entries 23401-23452
Generating Variant Peptide FASTA and Key Files - Epitope Length 9 - Entries 23401-23452
Generating Variant Peptide FASTA and Key Files - Epitope Length 10 - Entries 23401-23452
Generating Variant Peptide FASTA and Key Files - Epitope Length 11 - Entries 23401-23452
Completed
Making binding predictions on Allele H-2-Dd and Epitope Length 8 with Method NetMHCpanEL - File /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.netmhcpan_el.H-2-Dd.8.tsv_1-200
Traceback (most recent call last):
File "/mnt/beegfs/scratch/p_laplante/mhc_i/src/predict_binding.py", line 520, in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/pvacseq", line 8, in
What do you make of all this ?
By the way, I was wondering why the script is looking for "netMHCpan" in a bit of a convoluted way : "/mnt/beegfs/scratch/p_laplante/mhc_i/src/../method/netmhcpan-4.1-executable/netmhcpan_4_1_executable/netMHCpan" going in to src/ then going up and into method/. Do you think it could somehow play a role?
I know tried to reinstall miniconda in beegfs/scratch, to remove any mention of "@" and running with --fasta-size 100, to no avail:
Making binding predictions on Allele H-2-Dd and Epitope Length 8 with Method NetMHCpanEL - File /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.netmhcpan_el.H-2-Dd.8.tsv_1-100
Traceback (most recent call last):
File "/mnt/beegfs/scratch/p_laplante/mhc_i/src/predict_binding.py", line 520, in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/beegfs/scratch/p_laplante/miniconda3/envs/pvac/bin/pvacseq", line 8, in
Since this seems to be an installation issue with IEDB and not pVACtools itself, I would recommend opening a ticket with the IEDB help desk directly (help.iedb.org) and mentioning the error you get when running standalone IEDB directly (the /home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/python3.9 /home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py netmhcpan_el H-2-Dd 8 /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-200
command).
You might also want to consider switching to our Docker container to see if that fixes the error.
I have ended up using the Docker container (through Singularity), which resolved my problem. I tried a run with one sample which apparently succeeded.
The problem I face now is that it seems that not all the input files were generated.
I am missing the
Would you happen to know why this happened?
Here is a link for a screenshot of my output directory:
To note that I tried to run NetChop for stability data, and this is what happened:
[W::hts_idx_load3] The index file is older than the data file: /home/p_laplante@intra.igr.fr/VEP_annotated_vcf/5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz.tbi [W::hts_idx_load3] The index file is older than the data file: /home/p_laplante@intra.igr.fr/VEP_annotated_vcf/5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz.tbi [W::hts_idx_load3] The index file is older than the data file: /home/p_laplante@intra.igr.fr/VEP_annotated_vcf/5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz.tbi [W::hts_idx_load3] The index file is older than the data file: /home/p_laplante@intra.igr.fr/VEP_annotated_vcf/5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz.tbi [W::hts_idx_load3] The index file is older than the data file: /home/p_laplante@intra.igr.fr/VEP_annotated_vcf/5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz.tbi Warning: Proximal variant is not a missense mutation and will be skipped: X 113452690 Warning: Proximal variant is not a missense mutation and will be skipped: X 113452690 Warning: Proximal variant is not a missense mutation and will be skipped: X 113452690 Warning: Proximal variant is not a missense mutation and will be skipped: X 113452690 Warning: Proximal variant is not a missense mutation and will be skipped: X 113452690 Warning: Proximal variant is not a missense mutation and will be skipped: X 113452690 Warning: Proximal variant is not a missense mutation and will be skipped: X 126629127 Warning: Proximal variant is not a missense mutation and will be skipped: X 126629127 Completed Generating Variant Peptide FASTA and Key File Completed Parsing the Variant Peptide FASTA and Key File Completed Calculating Manufacturability Metrics Completed Calculating Manufacturability Metrics Completed Running Coverage Filters Completed Running Transcript Support Level Filter Complete Submitting remaining epitopes to NetChop WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))': /cgi-bin/webface2.cgi?jobid=64C41D010000179CE299EC52&wait=20 WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='services.healthtech.dtu.dk', port=443): Read timed out. (read timeout=10)")': /cgi-bin/webface2.cgi?jobid=64C43A6E000054B3C2AFEE55&wait=20 Completed
Done: Pipeline finished successfully. File /home/p_laplante@intra.igr.fr/output_pvactools/MHC_Class_I/msh2ko_2.filtered.tsv contains list of filtered putative neoantigens.
Do you think that the connection error to NetChop ended up terminating the pipeline too early?
I am attaching my inputs.yml (as .txt) for further information.
I forgot to add the command I used for this run, which is the following : pvacseq run \ $HOME/VEP_annotated_vcf/5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz \ msh2ko_5 \ H-2-Kd,H-2-Dd,H-2-Ld \ NetMHCpanEL \ $HOME/output_pvactools/ \ --phased-proximal-variants-vcf $HOME/VEP_annotated_vcf/phased/5_Msh2KO.phased.sorted.vcf.mm39.vcf.VEP_anno.vcf.gz \ --iedb-install-directory /opt/iedb \ --net-chop-method cterm \ --run-reference-proteome-similarity --blastp-path $HOME/ncbi-blast-2.14.1+/bin --blastp-db refseq_select_prot \
It looks like maybe NetChop was down or maybe you had a bad internet connection. Unfortunately, it looks like in this situation the run doesn't error out correctly. But this would be the cause for the missing files.
Can you try again and see if you continue running into this problem? Our status checker isn't reporting any issues so it should work. You can just restart your command and it should pick up where it left off as long as all the tmp files are still there.
Indeed, I have tried several times and it doesn't work past this. I have tried stripping all the options needing to connect to internet, running: pvacseq run $HOME/VEP_annotated_vcf/5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz msh2ko_5 H-2-Kd,H-2-Dd,H-2-Ld NetMHCpanEL $HOME/output_pvactools/ --phased-proximal-variants-vcf $HOME/VEP_annotated_vcf/phased/5_Msh2KO.phased.sorted.vcf.mm39.vcf.VEP_anno.vcf.gz --iedb-install-directory /opt/iedb --run-reference-proteome-similarity --blastp-path $HOME/ncbi-blast-2.14.1+/bin --blastp-db refseq_select_prot \ And the run still finished saying the pipeline completed, but the files are still missing.
Ok, I would like to try and replicate this issue on my end. Would you be able to attach your VCF files (5_tumor_only_twicefiltered_T_PASS_stFILT.vcf.gz.nogerm.vcf.gz.VEP_anno.gx.vcf.gz and 5_Msh2KO.phased.sorted.vcf.mm39.vcf.VEP_anno.vcf.gz) incl the tbi index files?
Here are the files: https://we.tl/t-ei7H4aTmeJ I had to use a wetransfer link, because the phased VCF is too big, and github doesn't allow upload of .tbi.
I apologize for not noticing this sooner but because you are only running with only an elution algorithm, we don't generate the aggregate report and all the other files needed for pVACview because we don't have sufficient information available with elution-only data. So this behavior is expected. If you want to generate this information, you would need to add an additional binding affinity algorithm to your run, such as NetMHCpan.
I will add some information about this to the documentation.
I see, thank you, it seems to be working now that I added different algorithms. One last question I had, how do you process multiple samples? Because the pipeline doesn't start if the input.yml has been created with sample 1, and you try to run sample 2. Is there a way to still write a loop? Or should I merge the VCFs?
Each pVACseq run is sample-specific. The output directory needs to be specific for the sample as well. You can achieve that by including the sample name in the output directory.
Ok, makes sense.
I got this error while running the proteome similarity:
Calculating Reference Proteome Similarity
Traceback (most recent call last):
File "/usr/local/bin/pvacseq", line 8, in
What does it mean?
I think this is an edge case in our parsing logic that we didn't account for, probably related to this being mouse data instead of human. I will make a bugfix for it.
Is this bug the reason why in the aggregated.tsv I have "Pending" for the "Evaluation" column on every row?
Also, would you be so kind as letting me know when the Docker image will be updated?
Thank you very much for your dedicated help.
No, the Pending is a placeholder to put in your own final evaluation status since usually not all neoantigen candidates can be included in a therapy. You can do that step by loading your results into pVACview where you have the ability to update the Evaluation/Eval column with your final decisions and then export the resulting TSV. If your final goal isn't neoantigen therapy selection you can just ignore this column. The Tier column would give you an indication on our verdict of the neoantigen candidate overall.
Unfortunately, I'm not sure when I a new docker image with this fix will be ready. I will be on vacation starting Thursday. I will try to get it out before then.
This issue should be fixed in pVACtools version 4.0.3. Please give it a try and let me know if you’re running into any other errors.
After downloading and installing pVACtools 4.0.3, I tried running the tools, and sadly the error still pops up.
I tried running the tool without the option, to look at the results and I ran into a problem with pVACview.
I tried through R using the following :
install.packages("shiny", dependencies=TRUE) install.packages("ggplot2", dependencies=TRUE) install.packages("DT", dependencies=TRUE) install.packages("reshape2", dependencies=TRUE) install.packages("jsonlite", dependencies=TRUE) install.packages("tibble", dependencies=TRUE) install.packages("tidyr", dependencies=TRUE) install.packages("plyr", dependencies=TRUE) install.packages("dplyr", dependencies=TRUE) install.packages("shinydashboard", dependencies=TRUE) install.packages("shinydashboardPlus", dependencies=TRUE) install.packages("fresh", dependencies=TRUE) install.packages("shinycssloaders", dependencies=TRUE) install.packages("RCurl", dependencies=TRUE) install.packages("curl", dependencies=TRUE) install.packages("stringr", dependencies=TRUE) install.packages("shinycssloaders", dependencies=TRUE)
shiny::runApp('C:\Pierre\Projets\MMR_meta\Neo\2\MHC_Class_I', port=3333)
The page loads, I upload the aggregated.tsv, and then when I upload the json, the page closes and I get disconnected.
This is what pops up in the console:
shiny::runApp('C:\Pierre\Projets\MMR_meta\Neo\2\MHC_Class_I', port=3333) Listening on http://127.0.0.1:3333 [1] FALSE [1] FALSE Warning: Error in [.data.frame: colonnes non définies sélectionnées #this pops up when I upload the .json 1: shiny::runApp [1] TRUE
The warning says "Warning: Error in [.data.frame: undefined columns selected" I tried through the webserver, and I get disconnected aswell.
What do you think is the problem?
Here is one of the MHC_Class_I folder containing the necessary files https://we.tl/t-kvfBO6yVfj
Thank you for these error reports. I have bugfixes for both issues in the works and will make a new release next week.
I just released version 4.0.4, which should fix these two issues. Please let me know if you still run into problems.
I'm closing this issue due to inactivity. I assume that the newest version fixed these errors.
Installation Type
Standalone
pVACtools Version / Docker Image
4.0.0
Python Version
3.9.1
Operating System
CentOs 7
Describe the bug
When I launch the following command I obtain this error message at the prediction step :
Making binding predictions on Allele H-2-Dd and Epitope Length 8 with Method NetMHCpanEL - File /mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.netmhcpan_el.H-2-Dd.8.tsv_1-200 Traceback (most recent call last): File "/home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py", line 520, in
Prediction().main()
File "/home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py", line 511, in main
self.commandline_input(args)
File "/home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py", line 135, in commandline_input
mhc_scores = mhc_predictor.predict(input.input_protein.as_amino_acid_text())
File "/mnt/nfs01/home/p_laplante@intra.igr.fr/mhc_i/src/seqpredictor.py", line 903, in predict
scores.append(predictor.predict_sequence(sequence,pred))
File "/mnt/nfs01/home/p_laplante@intra.igr.fr/mhc_i/src/seqpredictor.py", line 363, in predict_sequence
results = predict_netmhcpan(input_sequence_list, [(allele_name_or_sequence, self.length)], el=True)
File "/mnt/nfs01/home/p_laplante@intra.igr.fr/mhc_i/src/../method/netmhcpan-4.1-executable/netmhcpan_4_1_executable/init.py", line 79, in predict_many
process = Popen(cmd, stdout=PIPE)
File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/subprocess.py", line 947, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/subprocess.py", line 1819, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/nfs01/home/p_laplante@intra.igr.fr/mhc_i/src/../method/netmhcpan-4.1-executable/netmhcpan_4_1_executable/netMHCpan'
CRITICAL:pymp:An exception occured in thread 0: (<class 'subprocess.CalledProcessError'>, Command '['/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/python3.9', '/home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py', 'netmhcpan_el', 'H-2-Dd', '8', '/mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-200']' returned non-zero exit status 1.).
Traceback (most recent call last):
File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/pipeline.py", line 356, in call_iedb
pvactools.lib.call_iedb.main(arguments)
File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/call_iedb.py", line 44, in main
raise err
File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/call_iedb.py", line 39, in main
(response_text, output_mode) = prediction_class_object.predict(args.input_file, args.allele, args.epitope_length, args.iedb_executable_path, args.iedb_retries, tmp_dir=args.tmp_dir)
File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/prediction_class.py", line 58, in predict
response = run(arguments, stdout=response_fh, check=True)
File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/python3.9', '/home/p_laplante@intra.igr.fr/mhc_i/src/predict_binding.py', 'netmhcpan_el', 'H-2-Dd', '8', '/mnt/beegfs/scratch/p_laplante/output_pvactools/MHC_Class_I/tmp/msh2ko_2.8.fa.split_1-200']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/bin/pvacseq", line 8, in
sys.exit(main())
File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/tools/pvacseq/main.py", line 123, in main
args[0].func.main(args[1])
File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/tools/pvacseq/run.py", line 138, in main
pipeline.execute()
File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/pipeline.py", line 451, in execute
self.call_iedb(chunks)
File "/home/p_laplante@intra.igr.fr/miniconda3/envs/pvac/lib/python3.9/site-packages/pvactools/lib/pipeline.py", line 357, in call_iedb
p.print("Making binding predictions on Allele %s and Epitope Length %s with Method %s - File %s - Completed" % (a, epl, method, filename))
File "/home/p_laplante@intra.igr.fr/.local/lib/python3.9/site-packages/pymp/init.py", line 148, in exit
raise exc_t(exc_val)
TypeError: init() missing 1 required positional argument: 'cmd'
I noticed that in issue #772 someone reported the same error message, but changed subject after that.
Indeed the "missing file" /mnt/nfs01/home/p_laplante@intra.igr.fr/mhc_i/src/../method/netmhcpan-4.1-executable/netmhcpan_4_1_executable/netMHCpan is present when I go look for it.
How to reproduce this bug
Input files
No response
Log output
inputs_log.txt
Output files
No response