griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
141 stars 59 forks source link

pVACseq error during Class II prediction [__init__() missing 1 required positional argument: 'cmd'] #411

Closed ahwanpandey closed 5 years ago

ahwanpandey commented 5 years ago

Hello,

I just updated to the newest version of pVACseq and am seeing this error, this wasn't present in an earlier version:

An exception occured in thread 7: (<class 'subprocess.CalledProcessError'>, Command '['/bin/bash', '-l', '-c', 'conda activate pvactools_py27; python /opt/iedb/mhc_ii/mhc_II_binding.py smm_align DRB1*15:01 /researchers/username/Projects/WholeGenome/Project_DG/project/analysis_16_02_2018_NEO/a_older/res_top2_hla/MHC_Class_II/tmp/sample_name_31.fa.split_1-250']' returned non-zero exit status 1.).
Traceback (most recent call last):
  File "/opt/conda/bin/pvacseq", line 11, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.6/site-packages/tools/pvacseq/main.py", line 92, in main
    args[0].func.main(args[1])
  File "/opt/conda/lib/python3.6/site-packages/tools/pvacseq/run.py", line 191, in main
    pipeline.execute()
  File "/opt/conda/lib/python3.6/site-packages/lib/pipeline.py", line 426, in execute
    self.call_iedb(chunks)
  File "/opt/conda/lib/python3.6/site-packages/lib/pipeline.py", line 336, in call_iedb
    p.print("Making binding predictions on Allele %s and Epitope Length %s with Method %s - File %s - Completed" % (a, epl, method, filename))
  File "/opt/conda/lib/python3.6/site-packages/pymp/__init__.py", line 148, in __exit__
    raise exc_t(exc_val)
TypeError: __init__() missing 1 required positional argument: 'cmd'

Thanks.

susannasiebert commented 5 years ago

This type of error is usually intermittent but might indicate that your machine doesn't have enough memory for the amount of data getting analyzed. I suggest using a smaller --fasta-size or if you're working on a cluster, requesting a machine with more memory especially if you're using multiple threads. If you post the pVACseq command, I can give additional recommendations.

Can you attach the full error stack trace?

ahwanpandey commented 5 years ago

Hi @susannasiebert

These exact settings worked with an older version [1.3.5] until I updated to [1.4.2].

The command is:

pvactools.sif pvacseq run \
            $INPUT_VCF \
            $TUMOR_SAMPLE_NAME \
            $HLA_ALLELES \
            MHCflurry MHCnuggetsI MHCnuggetsII NNalign NetMHC NetMHCIIpan NetMHCcons NetMHCpan PickPocket SMM SMMPMBEC SMMalign \
            $OUTPUT_DIR \
            -e 8,9,10,11 \
            --normal-sample-name $NORMAL_SAMPLE_NAME \
            --iedb-install-directory /opt/iedb \
            --n-threads 8 \
            --binding-threshold 500 \
            --top-score-metric median \
            --peptide-sequence-length 21 \
            --additional-report-columns sample_name \
            --fasta-size 1000 \
            --downstream-sequence-length 1000 \
            --minimum-fold-change 0.0 \
            --normal-cov 0 \
            --tdna-cov 0 \
            --trna-cov 0 \
            --normal-vaf 0 \
            --tdna-vaf  0 \
            --trna-vaf 0 \
            --expn-val 0 \
            --maximum-transcript-support-level 1

The resources are

cpuspertask = "8"
mem = "16G"

Can you send me an e-mail address so I can send you a the stderr/out files? There might be cluster related info that the sys admins might not want me to share publicly.

Is this what you mean by error stack trace? Otherwise can you give me instructions on how to generate that?

Thanks!

ahwanpandey commented 5 years ago

I ran the same command again on the same data with 32GB memory and it stopped at the same place with the same error. The last line of the stdout was:

Making binding predictions on Allele DQB1*06:02 and Epitope Length 15 with Method MHCnuggetsII - File /researchers/username/Projects/WholeGenome/Project_DG/MOCOG/analysis_16_02_2018_NEO/samplename/NEO_pVACSeq/res_top2_hla/MHC_Class_II/tmp/samplename.MHCnuggetsII.DQB1*06:02.15.tsv_1-250 - Completed

and the error (with some more lines that I hadn't posted in the initial post)

Unable to open(r) file
list index out of range
2019-06-20 11:16:42.639631: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-20 11:16:42.646656: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2594010000 Hz
2019-06-20 11:16:42.648619: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5614ef108990 executing computations on platform Host. Devices:
2019-06-20 11:16:42.648721: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-06-20 11:16:43.757960: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-20 11:16:43.765332: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2594010000 Hz
2019-06-20 11:16:43.767291: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x56039e17ec60 executing computations on platform Host. Devices:
2019-06-20 11:16:43.767343: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-06-20 11:16:45.533397: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-20 11:16:45.540766: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2594010000 Hz
2019-06-20 11:16:45.542936: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55b5b5ad7940 executing computations on platform Host. Devices:
2019-06-20 11:16:45.543010: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-06-20 11:19:18.218609: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-20 11:19:18.226049: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2594010000 Hz
2019-06-20 11:19:18.228224: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x56360dbd42f0 executing computations on platform Host. Devices:
2019-06-20 11:19:18.228285: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
An exception occured in thread 7: (<class 'subprocess.CalledProcessError'>, Command '['/bin/bash', '-l', '-c', 'conda activate pvactools_py27; python /opt/iedb/mhc_ii/mhc_II_binding.py smm_align DRB1*15:01 /researchers/user_name/Projects/WholeGenome/Project_DG/MOCOG/analysis_16_02_2018_NEO/sample_name/NEO_pVACSeq/res_top2_hla/MHC_Class_II/tmp/sample_name_31.fa.split_1-250']' returned non-zero exit status 1.).
Traceback (most recent call last):
  File "/opt/conda/bin/pvacseq", line 11, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.6/site-packages/tools/pvacseq/main.py", line 92, in main
    args[0].func.main(args[1])
  File "/opt/conda/lib/python3.6/site-packages/tools/pvacseq/run.py", line 191, in main
    pipeline.execute()
  File "/opt/conda/lib/python3.6/site-packages/lib/pipeline.py", line 426, in execute
    self.call_iedb(chunks)
  File "/opt/conda/lib/python3.6/site-packages/lib/pipeline.py", line 336, in call_iedb
    p.print("Making binding predictions on Allele %s and Epitope Length %s with Method %s - File %s - Completed" % (a, epl, method, filename))
  File "/opt/conda/lib/python3.6/site-packages/pymp/__init__.py", line 148, in __exit__
    raise exc_t(exc_val)
TypeError: __init__() missing 1 required positional argument: 'cmd'

The reason I didn't post the Tensorflow warnings is that I had see this in the previous versions that worked so I thought it wasn't adding to the problem.

ahwanpandey commented 5 years ago

I just double checked and running version [1.3.5] on the same data, same parameters and on the same cluster with 16GB mem and 8 threads works without issues. The stderr has only this warning:

2019-06-21 14:04:58.294256: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
/opt/conda/lib/python3.6/site-packages/lib/output_parser.py:485: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  protein_identifiers_from_label = yaml.load(key_file_reader)
/opt/conda/lib/python3.6/site-packages/lib/output_parser.py:485: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  protein_identifiers_from_label = yaml.load(key_file_reader)

Are there any major differences between [1.3.5] and [1.4.2] ? If so, I might just stick with the older version for now.

susannasiebert commented 5 years ago

I believe the underlying failure reason is Unable to open(r) file list index out of range. You're correct that the tensorflow warnings are just noise. I'm not sure what is causing the index out of range error. Versions >=1.3.6 reintroduced parallelization of MHCflurry and MHCnuggets so that very well might be causing this error although we haven't had any issues running with the latest version. Now that MHCflurry and MHCnuggets are also running in parallel the 16G might not be enough. Have you tried monitoring the RAM usage of your job? Disk/temp space could also possibly cause this error. I would again try to reduce the --fasta-size parameter.

Do you get this error with all of your input files or just this particular one?

Can you possibly send your input VCF and the full stdout and stderr to help@pvactools.org? Without being able to reproduce this error on my end I have no way to debug this further.

ahwanpandey commented 5 years ago

Let me try with --fasta-size=400, 8 threads and 32GB of RAM first.

Also, these messages:

Forcing tensorflow backend.
Wrote: /tmp/tmpws2cg5z5
Forcing tensorflow backend.
Wrote: /tmp/tmpl5lg3ju8

Is there way to direct the temporary data to somewhere else?

If this doesn't work I'll package out the data and the stderr/out and send it to you.

Thanks for the help.

ahwanpandey commented 5 years ago

I tried --fasta-size=200, 8 threads and 32GB and it still stopped with the same error.

I have sent an e-mail with the relevant data and the stderr/out

Please let me know if you need anything else.

Thanks for the help.

susannasiebert commented 5 years ago

I was able to run with 8 threads, 24G and fasta-size 1000 on our cluster so I'm not sure why it's failing for you. Are you using the docker container or a standalone installation?

ahwanpandey commented 5 years ago

I am using the docker container

ahwanpandey commented 5 years ago

Is there a way to redirect /tmp?

susannasiebert commented 5 years ago

You should be able to set the TMPDIR environment variable to influence where tmp files get written. pVACseq uses the python tempfile module to set where temporary output gets written to and according to the documentation here (https://docs.python.org/3.5/library/tempfile.html#tempfile.gettempdir), that's how you should be able to influence where tmp files get created. I'm not sure what other side effects this might have though.

ahwanpandey commented 5 years ago

Hello,

That doesn't seem to work.

It is still outputting to /tmp

Forcing tensorflow backend.
Wrote: /tmp/tmp57jz7jvh
Forcing tensorflow backend.
Wrote: /tmp/tmpfq6qz55v
Forcing tensorflow backend.
Wrote: /tmp/tmpchph4q2t

I added this in the shell script before the pvactools command

export TMPDIR=/some/other/tmp/
export TEMP=/some/other/tmp/
export TMP=/some/other/tmp/

pvactools.sif pvacseq run ....

Also did you try my data with the docker container and the exact same commands? As in the the same HLA alleles and algorithms and such?

I'm not sure if it is even because of "/tmp" but I don't know what else to try :/

If the new version [1.4.2] doesn't work for me, I might just switch back to using [1.3.5] for now which works fine with no errors with 16GB RAM, 8 threads and --fasta-size 1000. Should there be any major differences in the results of the two?

Thanks!

susannasiebert commented 5 years ago

Here is what I ran on our cluster:

pvacseq run AN_T_MAOC00037-3-8_N_MAOC00037-5-7.pvacseq.VEP.peptides.with_rna_data.vcf  MAOC00037-3-8 HLA-A*24:02,HLA-A*68:01,HLA-B*35:03,HLA-B*07:02,HLA-C*07:02,HLA-C*04:01,DPA1*01:03,DPB1*04:01,DPB1*31:01,DPA1*01:03-DPB1*04:01,DPA1*01:03-DPB1*31:01,DQA1*01:02,DQA1*01:03,DQB1*06:02,DQB1*06:03,DQA1*01:02-DQB1*06:02,DQA1*01:02-DQB1*06:03,DQA1*01:03-DQB1*06:02,DQA1*01:03-DQB1*06:03,DRA*01:02,DRA*01:01,DRB1*13:01,DRB1*15:01 MHCflurry MHCnuggetsI MHCnuggetsII NNalign NetMHC NetMHCIIpan NetMHCcons NetMHCpan PickPocket SMM SMMPMBEC SMMalign AN_T_MAOC00037.test3 -e 8,9,10,11 --normal-sample-name MAOC00037-5-7 --n-threads 8 --binding-threshold 500 --top-score-metric median --peptide-sequence-length 21 --additional-report-columns sample_name --fasta-size 1000 --downstream-sequence-length 1000 --minimum-fold-change 0.0 --normal-cov 0 --tdna-cov 0 --trna-cov 0 --normal-vaf 0 --tdna-vaf  0 --trna-vaf 0 --expn-val 0 --maximum-transcript-support-level 1

/some/other/tmp/ needs to exist before running the pVACseq command. I tried with a non-existent directory and had the same problem but once I create the directory it worked:

Set TMPDIR


(base) root@350cbd4f8fe6:/opt/iedb# echo $TMPDIR

(base) root@350cbd4f8fe6:/opt/iedb# export TMPDIR=/new_dir (base) root@350cbd4f8fe6:/opt/iedb# echo $TMPDIR /new_dir

>Test TMPDIR without creating `new_dir` -> doesn't work

(base) root@350cbd4f8fe6:/opt/iedb# python Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) [GCC 7.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import os os.environ['TMPDIR'] '/new_dir' import tempfile tempfile.gettempdir() '/tmp' exit()

Create `/new_dir` and test -> works

(base) root@350cbd4f8fe6:/opt/iedb# mkdir /new_dir (base) root@350cbd4f8fe6:/opt/iedb# python Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) [GCC 7.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. import tempfile tempfile.gettempdir() '/new_dir' exit()

Run pvacseq with MHCflurry

(base) root@350cbd4f8fe6:/opt/iedb# pvacseq run pvacseq_example_data/input.vcf Test HLA-A*02:01 MHCflurry /pvacseq_tmpdir_test -e 9 ... Wrote: /new_dir/tmp6simevam ...

You will probably want to actually mount a different directory to your docker container and use that for TMPDIR so that you know where the files get written to outside of the container. I'm not sure where outside of the container Docker writes content of arbitrary, unmounted directories. It might be the same location as the Docker container's /tmp in which case this won't make a difference whether you write to /tmp or /some/unmounted/directory.

There are no differences for pVACseq between version 1.3.7 and 1.4.x. However, 1.5.0 will have new pVACseq features so it would be good to figure out how to resolve your problems if you want to use upcoming new features.

ahwanpandey commented 5 years ago

Hello,

First off thanks for your patience here and helping me solve this issue. I really appreciate it!

I was able to redirect "tmp" as per your instructions, but the issue still persists.

An exception occured in thread 7: (<class 'subprocess.CalledProcessError'>, Command '['/bin/bash', '-l', '-c', 'conda activate pvactools_py27; python /opt/iedb/mhc_ii/mhc_II_binding.py smm_align DRB1*15:01 /researchers/user.name/Projects/WholeGenome/Project_DG/MOCOG/analysis_16_02_2018_NEO/sample_name/NEO_pVACSeq/res_top2_hla/MHC_Class_II/tmp/sample_name_31.fa.split_1-250']' returned non-zero exit status 1.).
Traceback (most recent call last):
  File "/opt/conda/bin/pvacseq", line 11, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.6/site-packages/tools/pvacseq/main.py", line 92, in main
    args[0].func.main(args[1])
  File "/opt/conda/lib/python3.6/site-packages/tools/pvacseq/run.py", line 191, in main
    pipeline.execute()
  File "/opt/conda/lib/python3.6/site-packages/lib/pipeline.py", line 426, in execute
    self.call_iedb(chunks)
  File "/opt/conda/lib/python3.6/site-packages/lib/pipeline.py", line 336, in call_iedb
    p.print("Making binding predictions on Allele %s and Epitope Length %s with Method %s - File %s - Completed" % (a, epl, method, filename))
  File "/opt/conda/lib/python3.6/site-packages/pymp/__init__.py", line 148, in __exit__
    raise exc_t(exc_val)
TypeError: __init__() missing 1 required positional argument: 'cmd'

I figured I would try to run the exact command that is causing the error manually and see what happens

[uname@cluster-name /home/uname/test]$ /bin/bash -l -c conda activate pvactools_py27 ; python /opt/iedb/mhc_ii/mhc_II_binding.py smm_align DRB1*15:01 /researchers/user.name/Projects/WholeGenome/Project_DG/MOCOG/analysis_16_02_2018_NEO/sample_name/NEO_pVACSeq/res_top2_hla/MHC_Class_II/tmp/sample_name_31.fa.split_1-250

usage: conda [-h] [-V] command ...

conda is a tool for managing and deploying applications, environments and packages.

Options:

positional arguments:
  command
    clean        Remove unused packages and caches.
    config       Modify configuration values in .condarc. This is modeled
                 after the git config command. Writes to the user .condarc
                 file (/home/uname/.condarc) by default.
    create       Create a new conda environment from a list of specified
                 packages.
    help         Displays a list of available conda commands and their help
                 strings.
    info         Display information about current conda install.
    install      Installs a list of packages into a specified conda
                 environment.
    list         List linked packages in a conda environment.
    package      Low-level conda package utility. (EXPERIMENTAL)
    remove       Remove a list of packages from a specified conda environment.
    uninstall    Alias for conda remove. See conda remove --help.
    search       Search for packages and display associated information. The
                 input is a MatchSpec, a query language for conda packages.
                 See examples below.
    update       Updates conda packages to the latest compatible version. This
                 command accepts a list of package names and updates them to
                 the latest versions that are compatible with all other
                 packages in the environment. Conda attempts to install the
                 newest versions of the requested packages. To accomplish
                 this, it may update some packages that are already installed,
                 or install additional packages. To prevent existing packages
                 from updating, use the --no-update-deps option. This may
                 force conda to install older versions of the requested
                 packages, and it does not prevent additional dependency
                 packages from being installed. If you wish to skip dependency
                 checking altogether, use the '--force' option. This may
                 result in an environment with incompatible packages, so this
                 option must be used with great caution.
    upgrade      Alias for conda update. See conda update --help.

optional arguments:
  -h, --help     Show this help message and exit.
  -V, --version  Show the conda version number and exit.

conda commands available from other packages:
  env
  File "/opt/iedb/mhc_ii/mhc_II_binding.py", line 197
    print "Content-Type: text/html"
                                  ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Content-Type: text/html")?

[uname@cluster-name /home/uname/test]$ which python
/opt/conda/bin/python

[uname@cluster-name /home/uname/test]$ /opt/conda/bin/python --version
Python 3.6.5 :: Anaconda, Inc.

Is the required python version not being activated hence the error?

Thanks.

susannasiebert commented 5 years ago

The missing 1 required positional argument: 'cmd' is just the error message that the parent process launching the dead process throws. The error reason from the child process itself can usually be found further up mixed in with the other. I think the real error is Unable to open(r) file list index out of range but that could be a red herring. There are a few instances of this error in the .err output but I didn't find anything else that looked out of place. I'm also concerned about all of the module: command not found and mypython: command not found but they are probably unrelated.

In order to run the individual command you need to use double quotes around the conda activate ... E.g., this works for me and finishes successfully: /bin/bash -l -c "conda activate pvactools_py27; python /opt/iedb/mhc_ii/mhc_II_binding.py smm_align DQA1*01:02/DQB1*06:02 //AN_T_MAOC00037.test6/MHC_Class_II/tmp/MAOC00037-3-8_31.fa.split_201-250"

I'm still thinking that this is a memory issue or an out of space issue. When you try to run the individual child process commands they will probably succeed, and each time one of your processes fails it's for a different child process. The same command that fails for you succeeds for me on my machine. So the error is non-deterministic.

When you redirected tmp, that location had plenty of space available? Can you try running with just 2 threads and 16G or even 32G of RAM?

ahwanpandey commented 5 years ago

Hi @susannasiebert

I think I am getting closer!

intial commands to load docker container

(virtualenv_python_2.7.5) [apandey@cluster /data/directory]$ module load pvactools/1.4.2
(virtualenv_python_2.7.5) [apandey@cluster /data/directory]$ pvactools.sif

command not working

[apandey@cluster /data/directory]$ /bin/bash -l -c "conda activate pvactools_py27; python /opt/iedb/mhc_ii/mhc_II_binding.py smm_align DRB1*15:01 /data/directory/NEO_pVACSeq/res_top2_hla/MHC_Class_II/tmp/MAOC00037-3-8_31.fa.split_1-250"
Unable to open(r) file
list index out of range

command working after specifying PATH to data inside command

[apandey@cluster /data/directory]$ /bin/bash -l -c "export PATH=/data/directory:$PATH; conda activate pvactools_py27; python /opt/iedb/mhc_ii/mhc_II_binding.py smm_align DRB1*15:01 /data/directory/NEO_pVACSeq/res_top2_hla/MHC_Class_II/tmp/MAOC00037-3-8_31.fa.split_1-250" | head
allele  seq_num start   end     core_peptide    peptide ic50    percentile_rank
HLA-DRB1*15:01  72      9       23      LMIQLLFVL       GELMIQLLFVLYGIL 58.0    0.56
HLA-DRB1*15:01  25      6       20      IRLFEPLVI       HNHIRLFEPLVIKAL 60.0    0.59
HLA-DRB1*15:01  72      8       22      LMIQLLFVL       GGELMIQLLFVLYGI 61.0    0.6
HLA-DRB1*15:01  25      7       21      IRLFEPLVI       NHIRLFEPLVIKALK 63.0    0.64
HLA-DRB1*15:01  72      11      25      LFVLYGILA       LMIQLLFVLYGILAL 63.0    0.64
HLA-DRB1*15:01  25      5       19      IRLFEPLVI       IHNHIRLFEPLVIKA 65.0    0.67
HLA-DRB1*15:01  25      4       18      IRLFEPLVI       AIHNHIRLFEPLVIK 66.0    0.69
HLA-DRB1*15:01  25      3       17      HIRLFEPLV       NAIHNHIRLFEPLVI 67.0    0.7

Is there a hint here on how to make it work?

Thanks!

susannasiebert commented 5 years ago

Have you tried to set PATH before your pvactools command, e.g.: PATH=/data/directory:$PATH pvacseq run ...

ahwanpandey commented 5 years ago

Here are all the scenarios I have tried. Interestingly for 4, even though the file exists after the pvactools.sif command, the file is still not found by the mhc_II_binding.py command

1) works
module load pvactools/1.4.2
pvactools.sif
/bin/bash -l -c "export PATH=/data/directory:$PATH ; conda activate pvactools_py27; python /opt/iedb/mhc_ii/mhc_II_binding.py smm_align DRB1*15:01 /data/directory/NEO_pVACSeq/res_top2_hla/MHC_Class_II/tmp/MAOC00037-3-8_31.fa.split_1-250" | head
allele  seq_num start   end     core_peptide    peptide ic50    percentile_rank
HLA-DRB1*15:01  72      9       23      LMIQLLFVL       GELMIQLLFVLYGIL 58.0    0.56
HLA-DRB1*15:01  25      6       20      IRLFEPLVI       HNHIRLFEPLVIKAL 60.0    0.59
HLA-DRB1*15:01  72      8       22      LMIQLLFVL       GGELMIQLLFVLYGI 61.0    0.6
HLA-DRB1*15:01  25      7       21      IRLFEPLVI       NHIRLFEPLVIKALK 63.0    0.64
HLA-DRB1*15:01  72      11      25      LFVLYGILA       LMIQLLFVLYGILAL 63.0    0.64
HLA-DRB1*15:01  25      5       19      IRLFEPLVI       IHNHIRLFEPLVIKA 65.0    0.67
HLA-DRB1*15:01  25      4       18      IRLFEPLVI       AIHNHIRLFEPLVIK 66.0    0.69
HLA-DRB1*15:01  25      3       17      HIRLFEPLV       NAIHNHIRLFEPLVI 67.0    0.7
HLA-DRB1*15:01  72      7       21      LMIQLLFVL       PGGELMIQLLFVLYG 67.0    0.7

2) doesn't work
module load pvactools/1.4.2
export PATH=/data/directory:$PATH
pvactools.sif
/bin/bash -l -c "conda activate pvactools_py27; python /opt/iedb/mhc_ii/mhc_II_binding.py smm_align DRB1*15:01 /data/directory/NEO_pVACSeq/res_top2_hla/MHC_Class_II/tmp/MAOC00037-3-8_31.fa.split_1-250"
Unable to open(r) file
list index out of range

3) doesn't work
export PATH=/data/directory:$PATH
module load pvactools/1.4.2
pvactools.sif
/bin/bash -l -c "conda activate pvactools_py27; python /opt/iedb/mhc_ii/mhc_II_binding.py smm_align DRB1*15:01 /data/directory/NEO_pVACSeq/res_top2_hla/MHC_Class_II/tmp/MAOC00037-3-8_31.fa.split_1-250"
Unable to open(r) file
list index out of range

4) doesn't work
module load pvactools/1.4.2
pvactools.sif
export PATH=/data/directory:$PATH

ls -lah /data/directory/NEO_pVACSeq/res_top2_hla/MHC_Class_II/tmp/MAOC00037-3-8_31.fa.split_1-250
-rw-rw-r-- 1 user user 3.7K Jun 28 10:50 /data/directory/NEO_pVACSeq/res_top2_hla/MHC_Class_II/tmp/MAOC00037-3-8_31.fa.split_1-250

/bin/bash -l -c "conda activate pvactools_py27; python /opt/iedb/mhc_ii/mhc_II_binding.py smm_align DRB1*15:01 /data/directory/NEO_pVACSeq/res_top2_hla/MHC_Class_II/tmp/MAOC00037-3-8_31.fa.split_1-250"
Unable to open(r) file
list index out of range
susannasiebert commented 5 years ago

I'm not familiar with what pvactools.sif does. That seems to be specific to your system. I would like you try running the full pvacseq run command (not just the individual prediction call) while setting the PATH beforehand: PATH=/data/directory:$PATH pvacseq run ...

ahwanpandey commented 5 years ago

"pvactools.sif" is just a singularity image created from the docker image. Presumably the same method was applied for for the 2 versions of pvactools in the cluster here. The old version [1.3.5] which is working at the moment, and the new version [1.4.2] that breaks.

I am running with 64GB RAM and 2 threads:

PATH=/data/directory:$PATH pvactools.sif pvacseq run ...

Will report back shortly if it works.

ahwanpandey commented 5 years ago

I got the same exact error. Hmm not sure what else I can try.

susannasiebert commented 5 years ago

I believe you need to set PATH after you instantiate the singularity image.:

pvactools.sif
PATH=/data/directory:$PATH pvacseq run ..

The other thing that changed between the two versions you are testing is that we upgraded to the latest version on IEDB in Docker image 1.4.0+. Since the problem you are having is with calling one of the iEDB prediction methods, I suspect that this might be the culprit. Have you tried running your four tests on the 1.3.5 image? If they work on that version, I would suggest making your own Docker/singularity image that uses the older version of IEDB and the newest version of pVACtools.

ahwanpandey commented 5 years ago

I tried setting the PATH after instantiating the singularity image in scenario 4 with no luck.

1.3.5 works simply by

cd /data/directory
pvactools.sif pvacseq run ...
susannasiebert commented 5 years ago

Ok, can you try instantiating the 1.3.5 image and then upgrading pvactools to the latest version before running the pvacseq command. You can do so by running pip install pvactools --upgrade. This should give you version 1.4.2 with the old version of IEDB. This will help narrow down whether pvactools or IEDB is the problem.

susannasiebert commented 5 years ago

@ahwanpandey any updates on this issue?

ahwanpandey commented 5 years ago

Hey @susannasiebert,

Our cluster administrator is on vacation so I won't be able to test this until he is back :/

Thanks.

susannasiebert commented 5 years ago

We're now on version 1.5.2. I'm closing this issue but please feel free to reopen or make a new issue if you're still running into errors when running with the latest version.