griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
141 stars 59 forks source link

Command ''python2.7' returned non-zero exit status 1 #92

Closed ktroule closed 6 years ago

ktroule commented 6 years ago

Hi.

I've run pvacseq previously for a test and it wen't ok. Now I'm trying to do it for good but there is something that I'm not being able to understand.

By default I've python 2.7, so I use conda to run pvaseq on python3.5

source activate py35
pvacseq run /local/NeoPan/VCF/VCFout/onlyTumor_cleanNOTPASS_pan103_VEP.vcf  pan1033 HLA-A*30:02,HLA-B*18:01,HLA-C*05:00  NNalign NetMHCcons SMM SMMPMBEC SMMalign /local/NeoPan/pvacout -e 8,9,10,11 -i /local/NeoPan/BamDepth/YamlFiles/Yaml_pan103 --iedb-install-directory /local/pvacseqfiles -c 1

This is the error that I get.

Allele HLA-C*05:00 not valid. Skipping.
Executing MHC Class I predictions
Converting .vcf to TSV
TSV file already exists. Skipping.
Splitting TSV into smaller chunks
Splitting TSV into smaller chunks - Entries 1-67
Split TSV file for Entries 1-67 already exists. Skipping.
Completed
Generating Variant Peptide FASTA and Key Files
Split FASTA file for Entries 1-134 already exists. Skipping.
Completed
Processing entries for Allele HLA-A*30:02 and Epitope Length 8 - Entries 1-134
Running IEDB on Allele HLA-A*30:02 and Epitope Length 8 with Method NetMHCcons - Entries 1-134
ERROR:netmhccons_1_1_executable.netmhccons_1_1_python_interface:len(peptide_list) != len(scores) -- 14 != 0
The two methods NetMHCpan and NetMHC produced different outputs, number of peptides not the same

Traceback (most recent call last):
  File "/local/pvacseqfiles/mhc_i/src/predict_binding.py", line 415, in <module>
    Prediction().main()
  File "/local/pvacseqfiles/mhc_i/src/predict_binding.py", line 407, in main
    self.commandline_input(args)
  File "/local/pvacseqfiles/mhc_i/src/predict_binding.py", line 95, in commandline_input
    mhc_scores = mhc_predictor.predict(input.input_protein.as_amino_acid_text())
  File "/local/pvacseqfiles/mhc_i/src/seqpredictor.py", line 778, in predict
    scores.append(predictor.predict_sequence(sequence,pred))
  File "/local/pvacseqfiles/mhc_i/src/seqpredictor.py", line 365, in predict_sequence
    scores = predict_netmhccons(sequence, (allele_name_or_sequence, self.length))
  File "/local/pvacseqfiles/mhc_i/src/../method/netmhccons-1.1-executable/netmhccons_1_1_executable/netmhccons_1_1_python_interface.py", line 19, in predict_sequence
    scores = predict_peptide_list(peptide_list, allele_length_pair)
  File "/local/pvacseqfiles/mhc_i/src/../method/netmhccons-1.1-executable/netmhccons_1_1_executable/netmhccons_1_1_python_interface.py", line 81, in predict_peptide_list
    raise Exception(msg)
Exception: len(peptide_list) != len(scores) -- 14 != 0
Traceback (most recent call last):
  File "/usr/local/bin/pvacseq", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.5/site-packages/tools/pvacseq/main.py", line 78, in main
    args[0].func.main(args[1])
  File "/usr/local/lib/python3.5/site-packages/tools/pvacseq/run.py", line 126, in main
    pipeline.execute()
  File "/usr/local/lib/python3.5/site-packages/lib/pipeline.py", line 340, in execute
    split_parsed_output_files = self.call_iedb_and_parse_outputs(chunks)
  File "/usr/local/lib/python3.5/site-packages/lib/pipeline.py", line 473, in call_iedb_and_parse_outputs
    '-e', self.iedb_executable,
  File "/usr/local/lib/python3.5/site-packages/lib/call_iedb.py", line 56, in main
    response = run(prediction_class_object.iedb_executable_params(args), stdout=PIPE, check=True)
  File "/usr/local/lib/python3.5/subprocess.py", line 398, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['python2.7', '/local/pvacseqfiles/mhc_i/src/predict_binding.py', 'netmhccons', 'HLA-A*30:02', '8', '/local/NeoPan/pvacout/MHC_Class_I/tmp/pan1033_21.fa.split_1-134']' returned non-zero exit status 1

Sometimes i also get a message such: IOError: [Errno 2] No such file or directory: '/home/dorjee/Downloads/standalone/mhci-standalone/mhc_i/data/MHCI_mhcibinding20130222/smm/HLA-A-3002-8.cpickle'

To me it looks lije the problem might be related to either python, the IEDB or both, but after few days I've not been able to find an answer.

Once more, thanks for your time and sorry for not being able to give more info about this. Thanks

susannasiebert commented 6 years ago

This is an error thrown by IEDB's NetMHCcons algorithm: Exception: len(peptide_list) != len(scores) -- 14 != 0. Do you get this error all of the time with this input file or is it an intermittent error? If it's the first, I suspect NetMHCcons doesn't like one of the fasta entries that gets fed to it. If you can narrow it down from your list of variants we can maybe pinpoint, what the algorithm doesn't like about it and prevent pVACseq from including this fasta in the first place. I suspect you will need to contact IEDB to determine why NetMHCcons throws this particular error, though, once you find out what fasta is problematic.

For the second error, are you able to ls the file that it is complaining about? If yes, this might be an intermittent filesystem error on your system. If no, this would be a question for the IEDB help desk.

ktroule commented 6 years ago

Yes, indeed. It seems to be a problem related to NetMHCcons, once I remove this it seems to work smoothly. I'll try to solve this, but just in case I'm not as NetMHCcons = NetMHC + NetMHCpan + PickPocket. Would be ok if I add those three instead of NetMHCcons?

Anyways I'll try to solve it.

Thanks for your help.

susannasiebert commented 6 years ago

Yes, that would work. However, I'm not sure what method NetMHCcons uses to aggregate the binding affinity of those three prediction algorithms. pVACseq will output median and mean binding affinity of all prediction algorithms used, as well as the individual binding affinities predicted by each algorithm.

ktroule commented 6 years ago

Is it possible to know the command that pVac-seq is sending to the IEDB predictors?

Thanks, once more.

susannasiebert commented 6 years ago

If IEDB is causing an error, it will note the problematic IEDB command in the error message (e.g. Command '['python2.7', '/local/pvacseqfiles/mhc_i/src/predict_binding.py', 'netmhccons', 'HLA-A*30:02', '8', '/local/NeoPan/pvacout/MHC_Class_I/tmp/pan1033_21.fa.split_1-134']'). You would be able to execute that on the command line by just concatenating all the parts inside the array [] together.

For runs that don't cause an error, pVACseq doesn't currently output the IEDB commands that get run but you should be able to infer them from the status messages. pVACseq makes separate calls to IEDB for each allele/prediction algorithm/epitope length combination as well as subsets the protein fasta files into smaller subsets.

ktroule commented 6 years ago

I've run this command (from another run), and it seems to be working okey when I run as: python2.7 /local/pvacseqfiles/mhc_i_V2/mhc_i/src/predict_binding.py netmhccons HLA-A*30:02 8 /local/NeoPan/pvacout/pan103/MHC_Class_I/tmp/pan103_21.fa.split_1-134 Output (I'm not showing the full output)

allele  seq_num start   end length  peptide ic50    rank
HLA-A*30:02 6   50  57  8   KTYSHKSY    23.43   0.1
HLA-A*30:02 6   193 200 8   VRNHTLLY    280.65  1.2
HLA-A*30:02 6   8   15  8   WTHGEKPY    674.19  1.7
HLA-A*30:02 6   22  29  8   KTFRCKSF    1485.27 2.4
HLA-A*30:02 11  14  21  8   WCEKLFSY    1691.19 2.4
HLA-A*30:02 12  14  21  8   WCEKLFSY    1691.19 2.4
HLA-A*30:02 6   78  85  8   KSFHCKSF    1894.66 2.5
HLA-A*30:02 6   162 169 8   KTFHRKSF    1978.46 2.6
HLA-A*30:02 2   8   15  8   KSKPFVHH    2043.73 2.6
HLA-A*30:02 1   8   15  8   KSKLFVHH    2936.57 3
HLA-A*30:02 3   11  18  8   KSKLFVHH    2936.57 3
HLA-A*30:02 6   334 341 8   KSKLFVHH    2936.57 3
HLA-A*30:02 6   55  62  8   KSYLTVHH    3033.45 3
HLA-A*30:02 6   405 412 8   SDVAEAGY    3219.45 3.2
HLA-A*30:02 6   223 230 8   RQIFRSIK    3416.85 3.3
HLA-A*30:02 6   213 220 8   VMNVENPF    3890.57 3.8
HLA-A*30:02 6   418 425 8   HSFFPWGK    4288.48 4.1
HLA-A*30:02 6   385 392 8   KTFCQKSH    4311.75 4.2
HLA-A*30:02 19  8   15  8   ASSISTGH    4502.45 4.3
HLA-A*30:02 20  8   15  8   ASSVSTGH    4576.12 4.5
HLA-A*30:02 8   9   16  8   THIGEKPY    4651.0  4.5
HLA-A*30:02 2   6   13  8   SQKSKPFV    4676.23 4.6
HLA-A*30:02 21  5   12  8   KMWRQEKM    4752.74 4.7
HLA-A*30:02 23  5   12  8   KMWRQEKM    4752.74 4.7
HLA-A*30:02 25  5   12  8   KMWRQEKM    4752.74 4.7
HLA-A*30:02 6   36  43  8   THAGEKPY    4804.44 4.7
HLA-A*30:02 1   6   13  8   SQKSKLFV    4830.5  4.8
HLA-A*30:02 3   9   16  8   SQKSKLFV    4830.5  4.8
HLA-A*30:02 6   332 339 8   SQKSKLFV    4830.5  4.8
HLA-A*30:02 19  9   16  8   SSISTGHA    4883.05 4.8
HLA-A*30:02 6   120 127 8   THTGERPY    4962.95 4.9
HLA-A*30:02 6   45  52  8   CNECGKTY    5099.03 5
HLA-A*30:02 6   185 192 8   VMNVEKLF    5210.57 5.2
HLA-A*30:02 5   3   10  8   LTIHQWTH    5267.25 5.2

So, to me it looks like IEDB is working ok as standalone.

susannasiebert commented 6 years ago

Not sure why it is working standalone but failing inside of pVACseq. Have you tried rerunning the original command with a fresh output directory? I'm wondering if it is reproducibly failing or if this was intermittent.

ktroule commented 6 years ago

I'll try and I'll let you know.

ktroule commented 6 years ago

I'm not fully sure but I think I've detected what the problem is.

I run pvaseq inside a script (I've multiple samples).

I've found that if I activate the python3.5 environment using conda and then run the script, the program crashes. But only crashes if I have NetMHCcons as binding prediction method activated.

I run the script with the default python2.7 version, without setting the environment with conda, pvacseq runs ok (or so it seems,as I'm currently running the first sample).

Does pvacseq automatically check for conda to be installed and use the right version of python?

susannasiebert commented 6 years ago

We've had a few problems with mismatches of python versions in the past. I'm surprised that you are able to run pVACseq inside of your script using python2.7. I don't have an answer for that. Since it is a package that gets installed into a specific python environment, you should only be able to use it in that environment.

However, what you are describing explains maybe what is going on here. The problem usually is that IEDB calls out to another script and even though pVACseq invokes the main IEDB prediction script with python2.7, the subsequent calls made by IEDB itself use whatever version is the default on the user's system (which is 3.5 when you are in your 3.5 conda environment). For some of these errors we've been able to solve them by modifying the IEDB scripts' shebang lines to explicitly use python 2.7 (#!/usr/bin/env python2.7 ). I tried going down that rabbit hole with NetMHCcons but since it calls out to other scripts, I haven't been successful in identifying which particular scripts to modify.

ktroule commented 6 years ago

Thanks for your time.

And a the last question, hope so. There is, at least one sample that when running the pvacseq get stuck (so far it's been like that for more than 4 hours), the last print on the screen is: Running IEDB on Allele HLA-A*03:01 and Epitope Length 8 with Method NetMHCcons - Entries 1-200

I've check the process in htop, its using 0% CPU, but it doesn't crash it seems to be on a halt. Have you seen this behavior before? I don't see a file that I can check to know what might be happening.

Once more, thanks.

susannasiebert commented 6 years ago

Hm, that is strange. That's with a local IEDB install? I would try and restart that one. You could also try to reduce the fasta size (--fasta-size parameter), which will result in more calls to IEDB but with a smaller amount of sequences in each fasta file.

ktroule commented 6 years ago

Yes, local IEDB install. Reducing --fasta-size does not seem to be working about 3 of the 23 patients I'm analyzing have this problem, at some points pvaseq stops, it always seems to be happening while netMHCcons is being run.

susannasiebert commented 6 years ago

What fasta size parameter where you using? Are any of the calls to IEDB successful with a smaller set or does it always get stuck on the very first "chunk"? If you chose a very small fasta size (like <5) do any of the chunks succeed?

Does the fasta that gets fed to IEDB contain frameshift sequences? What length of the downstream sequence did you chose? Is there maybe a sequence in there that is extremely long (thousands of amino acids)? If so, you can try decreasing the --downstream-sequence-length and see if that fixes things.

pVACseq creates intermediate fasta files for each chunk. You could try running the one that gets stuck through IEDB standalone and see if there maybe is an error message that pVACseq isn't catching. However, if after trying everything above, you still encounter this problem and IEDB simply stops processing, there isn't much that pVACseq can do there. You might need to put in a help desk ticket with IEDB to further assess this problem. I can imagine that for these consensus algorithms, a machine with more resources might be necessary but this is purely speculation on my part. I don't know enough about how these algorithms are run but the folks are IEDB might be able to help more.

ktroule commented 6 years ago

Thanks.

I've been trying with chunks of 6 I'll try to use IEDB with the latest chunk to see what a happens, I'll also to reduce the --downstream-sequence-length from the original 1000 to 100 to see what happens. Thanks

ktroule commented 6 years ago

I still have to test it in all samples, but it looks setting --downstream-sequence-length 100 such way works with no problem.

Thanks

susannasiebert commented 6 years ago

Glad that fixed it. Is it ok to resolve this issue?

ktroule commented 6 years ago

Sure. Thanks for your help, everything is working fine.