Closed ktroule closed 6 years ago
This is an error thrown by IEDB's NetMHCcons algorithm: Exception: len(peptide_list) != len(scores) -- 14 != 0
. Do you get this error all of the time with this input file or is it an intermittent error? If it's the first, I suspect NetMHCcons doesn't like one of the fasta entries that gets fed to it. If you can narrow it down from your list of variants we can maybe pinpoint, what the algorithm doesn't like about it and prevent pVACseq from including this fasta in the first place. I suspect you will need to contact IEDB to determine why NetMHCcons throws this particular error, though, once you find out what fasta is problematic.
For the second error, are you able to ls
the file that it is complaining about? If yes, this might be an intermittent filesystem error on your system. If no, this would be a question for the IEDB help desk.
Yes, indeed. It seems to be a problem related to NetMHCcons, once I remove this it seems to work smoothly. I'll try to solve this, but just in case I'm not as NetMHCcons = NetMHC + NetMHCpan + PickPocket. Would be ok if I add those three instead of NetMHCcons?
Anyways I'll try to solve it.
Thanks for your help.
Yes, that would work. However, I'm not sure what method NetMHCcons uses to aggregate the binding affinity of those three prediction algorithms. pVACseq will output median and mean binding affinity of all prediction algorithms used, as well as the individual binding affinities predicted by each algorithm.
Is it possible to know the command that pVac-seq is sending to the IEDB predictors?
Thanks, once more.
If IEDB is causing an error, it will note the problematic IEDB command in the error message (e.g. Command '['python2.7', '/local/pvacseqfiles/mhc_i/src/predict_binding.py', 'netmhccons', 'HLA-A*30:02', '8', '/local/NeoPan/pvacout/MHC_Class_I/tmp/pan1033_21.fa.split_1-134']'
). You would be able to execute that on the command line by just concatenating all the parts inside the array [] together.
For runs that don't cause an error, pVACseq doesn't currently output the IEDB commands that get run but you should be able to infer them from the status messages. pVACseq makes separate calls to IEDB for each allele/prediction algorithm/epitope length combination as well as subsets the protein fasta files into smaller subsets.
I've run this command (from another run), and it seems to be working okey when I run as:
python2.7 /local/pvacseqfiles/mhc_i_V2/mhc_i/src/predict_binding.py netmhccons HLA-A*30:02 8 /local/NeoPan/pvacout/pan103/MHC_Class_I/tmp/pan103_21.fa.split_1-134
Output (I'm not showing the full output)
allele seq_num start end length peptide ic50 rank
HLA-A*30:02 6 50 57 8 KTYSHKSY 23.43 0.1
HLA-A*30:02 6 193 200 8 VRNHTLLY 280.65 1.2
HLA-A*30:02 6 8 15 8 WTHGEKPY 674.19 1.7
HLA-A*30:02 6 22 29 8 KTFRCKSF 1485.27 2.4
HLA-A*30:02 11 14 21 8 WCEKLFSY 1691.19 2.4
HLA-A*30:02 12 14 21 8 WCEKLFSY 1691.19 2.4
HLA-A*30:02 6 78 85 8 KSFHCKSF 1894.66 2.5
HLA-A*30:02 6 162 169 8 KTFHRKSF 1978.46 2.6
HLA-A*30:02 2 8 15 8 KSKPFVHH 2043.73 2.6
HLA-A*30:02 1 8 15 8 KSKLFVHH 2936.57 3
HLA-A*30:02 3 11 18 8 KSKLFVHH 2936.57 3
HLA-A*30:02 6 334 341 8 KSKLFVHH 2936.57 3
HLA-A*30:02 6 55 62 8 KSYLTVHH 3033.45 3
HLA-A*30:02 6 405 412 8 SDVAEAGY 3219.45 3.2
HLA-A*30:02 6 223 230 8 RQIFRSIK 3416.85 3.3
HLA-A*30:02 6 213 220 8 VMNVENPF 3890.57 3.8
HLA-A*30:02 6 418 425 8 HSFFPWGK 4288.48 4.1
HLA-A*30:02 6 385 392 8 KTFCQKSH 4311.75 4.2
HLA-A*30:02 19 8 15 8 ASSISTGH 4502.45 4.3
HLA-A*30:02 20 8 15 8 ASSVSTGH 4576.12 4.5
HLA-A*30:02 8 9 16 8 THIGEKPY 4651.0 4.5
HLA-A*30:02 2 6 13 8 SQKSKPFV 4676.23 4.6
HLA-A*30:02 21 5 12 8 KMWRQEKM 4752.74 4.7
HLA-A*30:02 23 5 12 8 KMWRQEKM 4752.74 4.7
HLA-A*30:02 25 5 12 8 KMWRQEKM 4752.74 4.7
HLA-A*30:02 6 36 43 8 THAGEKPY 4804.44 4.7
HLA-A*30:02 1 6 13 8 SQKSKLFV 4830.5 4.8
HLA-A*30:02 3 9 16 8 SQKSKLFV 4830.5 4.8
HLA-A*30:02 6 332 339 8 SQKSKLFV 4830.5 4.8
HLA-A*30:02 19 9 16 8 SSISTGHA 4883.05 4.8
HLA-A*30:02 6 120 127 8 THTGERPY 4962.95 4.9
HLA-A*30:02 6 45 52 8 CNECGKTY 5099.03 5
HLA-A*30:02 6 185 192 8 VMNVEKLF 5210.57 5.2
HLA-A*30:02 5 3 10 8 LTIHQWTH 5267.25 5.2
So, to me it looks like IEDB is working ok as standalone.
Not sure why it is working standalone but failing inside of pVACseq. Have you tried rerunning the original command with a fresh output directory? I'm wondering if it is reproducibly failing or if this was intermittent.
I'll try and I'll let you know.
I'm not fully sure but I think I've detected what the problem is.
I run pvaseq inside a script (I've multiple samples).
I've found that if I activate the python3.5 environment using conda and then run the script, the program crashes. But only crashes if I have NetMHCcons as binding prediction method activated.
I run the script with the default python2.7 version, without setting the environment with conda, pvacseq runs ok (or so it seems,as I'm currently running the first sample).
Does pvacseq automatically check for conda to be installed and use the right version of python?
We've had a few problems with mismatches of python versions in the past. I'm surprised that you are able to run pVACseq inside of your script using python2.7. I don't have an answer for that. Since it is a package that gets installed into a specific python environment, you should only be able to use it in that environment.
However, what you are describing explains maybe what is going on here. The problem usually is that IEDB calls out to another script and even though pVACseq invokes the main IEDB prediction script with python2.7, the subsequent calls made by IEDB itself use whatever version is the default on the user's system (which is 3.5 when you are in your 3.5 conda environment). For some of these errors we've been able to solve them by modifying the IEDB scripts' shebang lines to explicitly use python 2.7 (#!/usr/bin/env python2.7
). I tried going down that rabbit hole with NetMHCcons but since it calls out to other scripts, I haven't been successful in identifying which particular scripts to modify.
Thanks for your time.
And a the last question, hope so.
There is, at least one sample that when running the pvacseq get stuck (so far it's been like that for more than 4 hours), the last print on the screen is:
Running IEDB on Allele HLA-A*03:01 and Epitope Length 8 with Method NetMHCcons - Entries 1-200
I've check the process in htop, its using 0% CPU, but it doesn't crash it seems to be on a halt. Have you seen this behavior before? I don't see a file that I can check to know what might be happening.
Once more, thanks.
Hm, that is strange. That's with a local IEDB install? I would try and restart that one. You could also try to reduce the fasta size (--fasta-size
parameter), which will result in more calls to IEDB but with a smaller amount of sequences in each fasta file.
Yes, local IEDB install.
Reducing --fasta-size
does not seem to be working
about 3 of the 23 patients I'm analyzing have this problem, at some points pvaseq stops, it always seems to be happening while netMHCcons is being run.
What fasta size parameter where you using? Are any of the calls to IEDB successful with a smaller set or does it always get stuck on the very first "chunk"? If you chose a very small fasta size (like <5) do any of the chunks succeed?
Does the fasta that gets fed to IEDB contain frameshift sequences? What length of the downstream sequence did you chose? Is there maybe a sequence in there that is extremely long (thousands of amino acids)? If so, you can try decreasing the --downstream-sequence-length
and see if that fixes things.
pVACseq creates intermediate fasta files for each chunk. You could try running the one that gets stuck through IEDB standalone and see if there maybe is an error message that pVACseq isn't catching. However, if after trying everything above, you still encounter this problem and IEDB simply stops processing, there isn't much that pVACseq can do there. You might need to put in a help desk ticket with IEDB to further assess this problem. I can imagine that for these consensus algorithms, a machine with more resources might be necessary but this is purely speculation on my part. I don't know enough about how these algorithms are run but the folks are IEDB might be able to help more.
Thanks.
I've been trying with chunks of 6
I'll try to use IEDB with the latest chunk to see what a happens, I'll also to reduce the --downstream-sequence-length
from the original 1000 to 100 to see what happens. Thanks
I still have to test it in all samples, but it looks setting --downstream-sequence-length 100
such way works with no problem.
Thanks
Glad that fixed it. Is it ok to resolve this issue?
Sure. Thanks for your help, everything is working fine.
Hi.
I've run pvacseq previously for a test and it wen't ok. Now I'm trying to do it for good but there is something that I'm not being able to understand.
By default I've python 2.7, so I use conda to run pvaseq on python3.5
This is the error that I get.
Sometimes i also get a message such:
IOError: [Errno 2] No such file or directory: '/home/dorjee/Downloads/standalone/mhci-standalone/mhc_i/data/MHCI_mhcibinding20130222/smm/HLA-A-3002-8.cpickle'
To me it looks lije the problem might be related to either python, the IEDB or both, but after few days I've not been able to find an answer.
Once more, thanks for your time and sorry for not being able to give more info about this. Thanks