icbi-lab / nextNEOpi

nextNEOpi: a comprehensive pipeline for computational neoantigen prediction
Other
65 stars 23 forks source link

Installation error with mamba #43

Closed fredsamhaak closed 2 months ago

fredsamhaak commented 10 months ago

Hi @riederd, Thanks for your great end-to-end tool.

I encountered an installation problem using commands below: mamba env create -f assets/nextNEOpi.yml

And the error message is like:

image

Would you please give me some advice for this? Really looking forward to hearing from you and thanks in advance.

All the best, He

fredsamhaak commented 10 months ago

I removed the specified channel: 'default' for these two packages and it worked but a new problem has arisen:

image

Seems like 'gatkPythonPackageArchive.zip' doesn't exist and there's no 'condaenv.1vzdcdw2.requirements.txt' in 'assets' directory.

riederd commented 10 months ago

Hi,

Are you trying to create the envs manually? There is no need for it, they will be created by nextflow automatically. However, we strongly recommend to use the singularity probably instead of conda.

fredsamhaak commented 10 months ago

Thanks @riederd and I am now using the singularity instead of conda/mamba.

To do a quick test, I use the provided test data: 'nextNEOpi_testdata.tar.gz' with commands: nextflow run nextNEOpi-master/nextNEOpi.nf \ --batchFile batchfile.csv \ -config nextNEOpi-master/conf/params.config \ --outputDir result \ --trim_adapters true \ --trim_adapters_RNAseq true \ --use_NetChop false \ --tmpDir tmpdir \ -profile singularity,cluster \ --accept_license \ --TCR false \ -resume

But the pipeline run into an error: `WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info. Error executing process > 'pVACseq (sample1)'

Caused by: Process pVACseq (sample1) terminated with an error exit status (1)

Command executed:

pvacseq run \ --iedb-install-directory /opt/iedb \ -t 10 \ -p sample1_vep_phased.vcf.gz \ -e1 8,9,10,11 \ -e2 15,16,17,18,19,20,21,22,23,24,25 \ --normal-sample-name sample1_normal \ --tumor-purity 0.53 \ \ --netmhc-stab \ --binding-threshold 500 --top-score-metric median --minimum-fold-change 0.0 --normal-cov 5 --tdna-cov 10 --trna-cov 10 --normal-vaf 0.02 --tdna-vaf 0.25 --trna-vaf 0.25 --expn-val 1 --maximum-transcript-support-level 1 \ sample1_vep_somatic_gx.vcf.gz sample1_tumor HLA-A*29:02 NetMHCpan NetMHCpanEL MHCflurry MHCflurryEL NetMHCIIpan NetMHCIIpanEL ./

if [ -e ./MHC_Class_I/sample1_tumor.filtered.tsv ]; then mv ./MHC_Class_I/sample1_tumor.filtered.tsv ./MHC_Class_I/sample1_tumor_HLA-A29:02.filtered.tsv fi if [ -e ./MHC_Class_I/sample1_tumor.all_epitopes.tsv ]; then mv ./MHC_Class_I/sample1_tumor.all_epitopes.tsv ./MHC_Class_I/sample1_tumor_HLA-A29:02.all_epitopes.tsv fi if [ -e ./MHC_Class_II/sample1_tumor.filtered.tsv ]; then mv ./MHC_Class_II/sample1_tumor.filtered.tsv ./MHC_Class_II/sample1_tumor_HLA-A29:02.filtered.tsv fi if [ -e ./MHC_Class_II/sample1_tumor.all_epitopes.tsv ]; then mv ./MHC_Class_II/sample1_tumor.all_epitopes.tsv ./MHC_Class_II/sample1_tumor_HLA-A29:02.all_epitopes.tsv fi

Command exit status: 1

Command output: Warning: Proximal variant is not a missense mutation and will be skipped: chr6 32222629 Warning: Proximal variant is not a missense mutation and will be skipped: chr6 32642036 Completed Generating Variant Peptide FASTA and Key File Completed Parsing the Variant Peptide FASTA and Key File Completed Calculating Manufacturability Metrics Completed Splitting TSV into smaller chunks Splitting TSV into smaller chunks - Entries 1-54 Completed Generating Variant Peptide FASTA and Key Files Generating Variant Peptide FASTA and Key Files - Epitope Length 8 - Entries 1-108 Generating Variant Peptide FASTA and Key Files - Epitope Length 9 - Entries 1-108 Generating Variant Peptide FASTA and Key Files - Epitope Length 10 - Entries 1-108 Generating Variant Peptide FASTA and Key Files - Epitope Length 11 - Entries 1-108 Completed Making binding predictions on Allele HLA-A29:02 and Epitope Length 8 with Method NetMHCpanEL - File MHC_Class_I/tmp/sample1_tumor.netmhcpan_el.HLA-A29:02.8.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 9 with Method NetMHCpan - File MHC_Class_I/tmp/sample1_tumor.netmhcpan.HLA-A29:02.9.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 9 with Method NetMHCpanEL - File MHC_Class_I/tmp/sample1_tumor.netmhcpan_el.HLA-A29:02.9.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 10 with Method MHCflurry - File MHC_Class_I/tmp/sample1_tumor.MHCflurry.HLA-A29:02.10.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 10 with Method NetMHCpan - File MHC_Class_I/tmp/sample1_tumor.netmhcpan.HLA-A29:02.10.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 10 with Method NetMHCpanEL - File MHC_Class_I/tmp/sample1_tumor.netmhcpan_el.HLA-A29:02.10.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 11 with Method MHCflurry - File MHC_Class_I/tmp/sample1_tumor.MHCflurry.HLA-A29:02.11.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 11 with Method NetMHCpan - File MHC_Class_I/tmp/sample1_tumor.netmhcpan.HLA-A29:02.11.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 8 with Method MHCflurry - File MHC_Class_I/tmp/sample1_tumor.MHCflurry.HLA-A29:02.8.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 11 with Method NetMHCpanEL - File MHC_Class_I/tmp/sample1_tumor.netmhcpan_el.HLA-A29:02.11.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 9 with Method NetMHCpan - File MHC_Class_I/tmp/sample1_tumor.netmhcpan.HLA-A29:02.9.tsv_1-108 - Completed Making binding predictions on Allele HLA-A29:02 and Epitope Length 9 with Method NetMHCpanEL - File MHC_Class_I/tmp/sample1_tumor.netmhcpan_el - File MHC_Class_I/tmp/sample1_tumor.netmhcpan_el.HLA-A29:02.10.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 11 with Method MHCflurry - File MHC_Class_I/tmp/sample1_tumor.MHCflurry.HLA-A29:02.11.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 11 with Method NetMHCpan - File MHC_Class_I/tmp/sample1_tumor.netmhcpan.HLA-A29:02.11.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 8 with Method MHCflurry - File MHC_Class_I/tmp/sample1_tumor.MHCflurry.HLA-A29:02.8.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 11 with Method NetMHCpanEL - File MHC_Class_I/tmp/sample1_tumor.netmhcpan_el.HLA-A29:02.11.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 9 with Method NetMHCpan - File MHC_Class_I/tmp/sample1_tumor.netmhcpan.HLA-A29:02.9.tsv_1-108 - Completed Making binding predictions on Allele HLA-A29:02 and Epitope Length 9 with Method NetMHCpanEL - File MHC_Class_I/tmp/sample1_tumor.netmhcpan_el.HLA-A29:02.9.tsv_1-108 - Completed Making binding predictions on Allele HLA-A29:02 and Epitope Length 8 with Method NetMHCpanEL - File MHC_Class_I/tmp/sample1_tumor.netmhcpan_el.HLA-A29:02.8.tsv_1-108 - Completed Making binding predictions on Allele HLA-A29:02 and Epitope Length 9 with Method MHCflurry - File MHC_Class_I/tmp/sample1_tumor.MHCflurry.HLA-A29:02.9.tsv_1-108 Making binding predictions on Allele HLA-A29:02 and Epitope Length 10 with Method NetMHCpanEL - File MHC_Class_I/tmp/sample1_tumor.netmhcpan_el.HLA-A29:02.10.tsv_1-108 - Completed Making binding predictions on Allele HLA-A29:02 and Epitope Length 10 with Method NetMHCpan - File MHC_Class_I/tmp/sample1_tumor.netmhcpan.HLA-A29:02.10.tsv_1-108 - Completed Forcing tensorflow backend. Making binding predictions on Allele HLA-A29:02 and Epitope Length 8 with Method MHCflurry - File MHC_Class_I/tmp/sample1_tumor.MHCflurry.HLA-A29:02.8.tsv_1-108 - Completed Making binding predictions on Allele HLA-A29:02 and Epitope Length 8 with Method NetMHCpan - File MHC_Class_I/tmp/sample1_tumor.netmhcpan.HLA-A29:02.8.tsv_1-108 Forcing tensorflow backend. Making binding predictions on Allele HLA-A29:02 and Epitope Length 10 with Method MHCflurry - File MHC_Class_I/tmp/sample1_tumor.MHCflurry.HLA-A29:02.10.tsv_1-108 - Completed Making binding predictions on Allele HLA-A29:02 and Epitope Length 11 with Method NetMHCpanEL - File MHC_Class_I/tmp/sample1_tumor.netmhcpan_el.HLA-A29:02.11.tsv_1-108 - Completed Making binding predictions on Allele HLA-A29:02 and Epitope Length 11 with Method NetMHCpan - File MHC_Class_I/tmp/sample1_tumor.netmhcpan.HLA-A29:02.11.tsv_1-108 - Completed Forcing tensorflow backend. Making binding predictions on Allele HLA-A29:02 and Epitope Length 9 with Method MHCflurry - File MHC_Class_I/tmp/sample1_tumor.MHCflurry.HLA-A29:02.9.tsv_1-108 - Completed Making binding predictions on Allele HLA-A29:02 and Epitope Length 8 with Method NetMHCpan - File MHC_Class_I/tmp/sample1_tumor.netmhcpan.HLA-A29:02.8.tsv_1-108 - Completed

Command error: Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor WARNING:tensorflow:From /opt/conda/lib/python3.8/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:tensorflow:From /opt/conda/lib/python3.8/site-packages/keras/src/initializers/initializers_v1.py:297: calling RandomUniform.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor /opt/conda/lib/python3.8/site-packages/keras/src/engine/training_v1.py:2359: UserWarning: Model.state_updates will be removed in a future version. This property should not be used in TensorFlow 2.0, as updates are applied automatically. updates=self.state_updates, /opt/conda/lib/python3.8/site-packages/keras/src/engine/training_v1.py:2359: UserWarning: Model.state_updates will be removed in a future version. This property should not be used in TensorFlow 2.0, as updates are applied automatically. updates=self.state_updates, /opt/conda/lib/python3.8/site-packages/keras/src/engine/training_v1.py:2359: UserWarning: Model.state_updates will be removed in a future version. This property should not be used in TensorFlow 2.0, as updates are applied automatically. updates=self.state_updates, /opt/conda/lib/python3.8/site-packages/keras/src/engine/training_v1.py:2359: UserWarning: Model.state_updates will be removed in a future version. This property should not be used in TensorFlow 2.0, as updates are applied automatically. updates=self.state_updates, /opt/conda/lib/python3.8/site-packages/keras/src/engine/training_v1.py:2359: UserWarning: Model.state_updates will be removed in a future version. This property should not be used in TensorFlow 2.0, as updates are applied automatically. updates=self.state_updates, /opt/conda/lib/python3.8/site-packages/keras/src/engine/training_v1.py:2359: UserWarning: Model.state_updates will be removed in a future version. This property should not be used in TensorFlow 2.0, as updates are applied automatically. updates=self.state_updates, /opt/conda/lib/python3.8/site-packages/keras/src/engine/training_v1.py:2359: UserWarning: Model.state_updates will be removed in a future version. This property should not be used in TensorFlow 2.0, as updates are applied automatically. updates=self.state_updates, /opt/conda/lib/python3.8/site-packages/keras/src/engine/training_v1.py:2359: UserWarning: Model.state_updates will be removed in a future version. This property should not be used in TensorFlow 2.0, as updates are applied automatically. updates=self.state_updates, WARNING:tensorflow:From /opt/conda/lib/python3.8/site-packages/tensorflow/python/ops/init_ops.py:94: calling VarianceScaling.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor WARNING:tensorflow:From /opt/conda/lib/python3.8/site-packages/tensorflow/python/ops/init_ops.py:94: calling Zeros.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor /opt/conda/lib/python3.8/site-packages/keras/src/engine/training_v1.py:2359: UserWarning: Model.state_updates will be removed in a future version. This property should not be used in TensorFlow 2.0, as updates are applied automatically. updates=self.state_updates, CRITICAL:pymp:An exception occured in thread 7: (<class 'ValueError'>, Input X contains NaN. LogisticRegression does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values). Traceback (most recent call last): File "/opt/conda/bin/pvacseq", line 8, in sys.exit(main()) File "/opt/conda/lib/python3.8/site-packages/pvactools/tools/pvacseq/main.py", line 123, in main args[0].func.main(args[1]) File "/opt/conda/lib/python3.8/site-packages/pvactools/tools/pvacseq/run.py", line 138, in main pipeline.execute() File "/opt/conda/lib/python3.8/site-packages/pvactools/lib/pipeline.py", line 451, in execute self.call_iedb(chunks) File "/opt/conda/lib/python3.8/site-packages/pvactools/lib/pipeline.py", line 357, in call_iedb p.print("Making binding predictions on Allele %s and Epitope Length %s with Method %s - File %s - Completed" % (a, epl, method, filename)) File "/opt/conda/lib/python3.8/site-packages/pymp/init.py", line 148, in exit raise exc_t(exc_val) ValueError: Input X contains NaN. LogisticRegression does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values INFO: Cleaning up image...

Work dir: /home/heshen/01.project/01.neoantigen/01.project/nextneopi/test/testdata/work/49/020fb154a5b67d3d4cd282f388c509

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh `

Seems like WARNINGs for tensorflow are ok but ValueError: Input X contains NaN. for the LogisticRegression model matters? Would you please give me some suggestions?

Thank you very much! He

riederd commented 10 months ago

Hi, I'd need to see the contents of /home/heshen/01.project/01.neoantigen/01.project/nextneopi/test/testdata/work/49/020fb154a5b67d3d4cd282f388c509

can you make a tar.gz of it and post it?

fredsamhaak commented 10 months ago

Hi, Here is the compressed file:

020fb154a5b67d3d4cd282f388c509.tar.gz

riederd commented 10 months ago

Thanks, I'd need the input files from the work dir. Can you use the -h option in your tar command to dereference the symlinks and post the archive again, sorry for that.

I'm not sure what happens here, since I can not reproduce the problem locally. But let's see...

fredsamhaak commented 10 months ago

Hi, please check the compressed file below and thanks in advance:

020fb154a5b67d3d4cd282f388c509.tar.gz

riederd commented 10 months ago

Thanks, I just tried to reproduce the error with the contents of your work dir, however it did not fail and completed successfully.

Can you try to do the following and post the output:

cd /home/heshen/01.project/01.neoantigen/01.project/nextneopi/test/testdata/work/49/020fb154a5b67d3d4cd282f388c509/

singularity exec -B /home/heshen/01.project/01.neoantigen/01.project/nextneopi/test/testdata -B "$PWD" --no-home -H /home/heshen/01.project/01.neoantigen/01.project/nextneopi/test/testdata/tmpdir -B /home/heshen/01.project/01.neoantigen/01.project/nextneopi/test/testdata/nextNEOpi-master/assets -B /home/heshen/01.project/01.neoantigen/01.project/nextneopi/test/testdata/tmpdir -B /home/heshen/01.project/01.neoantigen/01.project/nextneopi/test/testdata/nextNEOpi-master/resources -B /home/heshen/01.project/01.neoantigen/01.project/nextneopi/test/testdata/nextNEOpi-master/resources/databases/iedb:/opt/iedb -B /home/heshen/01.project/01.neoantigen/01.project/nextneopi/test/testdata/nextNEOpi-master/resources/databases/mhcflurry_data:/opt/mhcflurry_data /home/heshen/01.project/01.neoantigen/01.project/nextneopi/test/testdata/work/singularity/apps-01.i-med.ac.at-images-singularity-pVACtools_4.0.1_icbi_4ae2625d.sif /bin/bash

This should start up the pVACseq container. When in, launch python, e.g.

Singularity> python

Then type the following and post the output:

import sys
import pprint
pprint.pprint(sys.path)
fredsamhaak commented 10 months ago

Hi, Here is the output:

截屏2023-09-20 16 59 35

['', '/home/heshen/01.project/01.neoantigen/00.biosoft/hisatgenotype/hisatgenotype_modules', '/home/heshen/01.project/01.neoantigen/01.project/nextneopi/test/testdata/work/49/020fb154a5b67d3d4cd282f388c509', '/opt/conda/lib/python38.zip', '/opt/conda/lib/python3.8', '/opt/conda/lib/python3.8/lib-dynload', '/opt/conda/lib/python3.8/site-packages']

riederd commented 10 months ago

There is a PYTHONPATH set which should not be there: /home/heshen/01.project/01.neoantigen/00.biosoft/hisatgenotype/hisatgenotype_modules This could interfere with the python packages in the container.

I'm not sure if it is also mounted into the container, so can you please enter the container as before an try to do an ls -la on that path above.

If you get a file listing, try to use unset PYTHONPATH before you start the pipeline.

fredsamhaak commented 10 months ago

Hi, I've checked that the path you mentioned doesn't exist both inside and out of the container:

截屏2023-09-21 10 55 21

But it is a little bit strange that sys.path found this.

riederd commented 10 months ago

May I ask you to manually rerun the process as follows, Just to make sure it wasn't just one time error:

cd /home/heshen/01.project/01.neoantigen/01.project/nextneopi/test/testdata/work/49/020fb154a5b67d3d4cd282f388c509
rm -rf MHC_Class_I
bash .command.run
fredsamhaak commented 10 months ago

Hi, I rerun bash .command.run and it comes into the same error as before: ValueError: Input X contains NaN..

I check the files in MHC_Class_I/tmp folder and find that *MHCflurry*.9.* and *MHCflurry*.11.* doesn't exits while all four files (*8*, *9*, *10*, *11*) are generated by netmhcpan_el and netmhcpan each:

截屏2023-09-22 09 41 45

Seems like this is the reason why LogisticRegression said that Input X contains NaN, but I don't know why MHCflurry didn't generate all of them (*8*, *9*, *10*, *11*).

-- I rerun bash .command.run several times and find that nearly each time MHCflurry generate different files (e.g. for one time it generates *8*, *10*, *11* and the next time *8*, *9*, *10*). This also happens on other alleles (e.g. HLA-B*45:01):

First run: 截屏2023-09-22 11 23 25

And the second run: 截屏2023-09-22 11 24 12

riederd commented 8 months ago

We are still investigating on this, we have access to a system on which we could reproduce the issue. Meanwhile you might disable the MHCflurry runs by setting the epitope_prediction_tools parameter accordingly in conf/params.config (line 209 or so)

i.e.

epitope_prediction_tools = "NetMHCpan NetMHCpanEL NetMHCIIpan NetMHCIIpanEL"