icbi-lab / nextNEOpi

nextNEOpi: a comprehensive pipeline for computational neoantigen prediction
Other
67 stars 24 forks source link

error message when run a test case #52

Closed ryao-mdanderson closed 1 year ago

ryao-mdanderson commented 1 year ago

Dear NextNEOpi authors:

I follow README, has the nextflow version 22.10.8 installed, set up reference in a customized site and modified resourcesBaseDir in conf/params.conf

Tried the following command to run a test case

$ nextflow run nextNEOpi.nf --batchFile testdata_batchFile_FASTQ.csv --CNVkit false -profile singularity -config conf/params.config --accept_license --TCR false

at the end of the output, I am seeing:


Execution cancelled -- Finishing pending tasks before exit [icbi/nextNEOpi] Pipeline Complete! You can find your results in /rsrch3/home/itops/ryao/nextNEOpi/results Error executing process > 'install_IEDB (Install IEDB)'

Caused by: Process install_IEDB (Install IEDB) terminated with an error exit status (3)

Command executed:

export TMPDIR=/tmp/ryao/nextNEOpi/

CWD=pwd cd /opt/iedb/ rm -f IEDB_MHC_I-3.1.4.tar.gz wget https://downloads.iedb.org/tools/mhci/3.1.4/IEDB_MHC_I-3.1.4.tar.gz tar -xzvf IEDB_MHC_I-3.1.4.tar.gz cd mhc_i bash -c "./configure" cd /opt/iedb/ rm -f IEDB_MHC_I-3.1.4.tar.gz

rm -f IEDB_MHC_II-3.1.8.tar.gz wget https://downloads.iedb.org/tools/mhcii/3.1.8/IEDB_MHC_II-3.1.8.tar.gz tar -xzvf IEDB_MHC_II-3.1.8.tar.gz

ATTENTION: IEDB_MHC_II-3.1.8.tar.gz "python configure.py"

returns an assertion error in the unittest needs

to be fixed, skip unittests for now

cd mhc_ii

bash -c "python ./configure.py"

cd /opt/iedb/ rm IEDB_MHC_II-3.1.8.tar.gz

export MHCFLURRY_DATA_DIR=/opt/mhcflurry_data mhcflurry-downloads fetch

cd $CWD echo "OK" > .iedb_install_ok.chck

Command exit status: 3

Command output: (empty)

Command error: --2023-09-28 17:01:10-- https://downloads.iedb.org/tools/mhci/3.1.4/IEDB_MHC_I-3.1.4.tar.gz Resolving downloads.iedb.org (downloads.iedb.org)... 8.37.117.143 Connecting to downloads.iedb.org (downloads.iedb.org)|8.37.117.143|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 345348484 (329M) [application/x-gzip] IEDB_MHC_I-3.1.4.tar.gz: Permission denied

Cannot write to ‘IEDB_MHC_I-3.1.4.tar.gz’ (Permission denied).

Work dir: /rsrch3/home/itops/ryao/nextNEOpi/work/6b/e0e98264330d8bc14d89ee7c13443f

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

--------- end of the command output

Would you please provide me some helps to understand this?

Thank you very much, Rong Yao

riederd commented 1 year ago

Hi, it seems that you do not have permissions to write to your resourcesBaseDir you specified. See:

Cannot write to ‘IEDB_MHC_I-3.1.4.tar.gz’ (Permission denied).

Can you show the output of the following commands: id ls -la your_resourcesBaseDir (Please substitute your_resourcesBaseDir with the real path of that directory you configured)

ryao-mdanderson commented 1 year ago

Hello @riederd

You are right. my userid doesn't have write permission. my userid is ryao. my resourceBaseDir configure in conf/param.conf is resourcesBaseDir = "/rsrch3/scratch/reflib/REFLIB_data/nextneopi-1.4.0/"

ls -la /rsrch3/scratch/reflib/REFLIB_data/nextneopi-1.4.0/ total 42316629 drwxr-xr-x 6 root root 4096 Sep 28 17:18 . drwxr-xr-x 70 root root 8192 Sep 29 09:35 .. drwxr-xr-x 7 40005 50000 4096 Sep 29 09:16 databases drwxr-xr-x 4 40005 50000 4096 May 3 2021 ExomeCaptureKits -rw-r--r-- 1 root root 345348484 Jan 2 2023 IEDB_MHC_I-3.1.4.tar.gz -rw-r--r-- 1 ryao rists 42662861027 Mar 22 2022 nextNEOpi_1.4_resources.tar.gz -rw-r--r-- 1 ryao rists 323911161 Jul 27 09:24 nextNEOpi_testdata.tar.gz drwxr-xr-x 7 40005 50000 4096 Mar 1 2022 references drwxr-xr-x 2 40005 50000 4096 Jul 26 10:17 testdata

I am the system admin and have root privilege, I would like to store the resources in a shared location, so that our cluster users can also use this resource. So, if I change directory ownership to be root:root, allow group and others read permission, is okay?

The readme does not mention to install IEDB ahead of time until I ran the test and found out... I manually download IEDB_MHC_I-3.1.4.tar.gz and untar it, configured iedb in location cd mhc_i bash -c ./configure

is this a right procedure?

[root@ldragon2 mhc_i]# ls -l /rsrch3/scratch/reflib/REFLIB_data/nextneopi-1.4.0/databases/iedb/mhc_i total 389 -rwxrwxr-x 1 p_matlab p_matlab 19 Jan 2 2023 configure -rw-rw-r-- 1 p_matlab p_matlab 11015 Jan 2 2023 Copenhagen_license.txt drwxrwxr-x 3 p_matlab p_matlab 4096 Jan 2 2023 data drwxrwxr-x 2 p_matlab p_matlab 4096 Jan 2 2023 examples -rw-rw-r-- 1 p_matlab p_matlab 11839 Jan 2 2023 LIAI_license.txt drwxrwxr-x 16 p_matlab p_matlab 4096 Jan 2 2023 method -rw-rw-r-- 1 p_matlab p_matlab 134656 Jan 2 2023 mhc_list.xls -rw-rw-r-- 1 p_matlab p_matlab 4795 Jan 2 2023 README drwxrwxr-x 2 p_matlab p_matlab 4096 Sep 29 09:28 src

Now I wonder the ownership p_matlab.

Thank you for your help.

ryao-mdanderson commented 1 year ago

I forgot to mention, mhc_i/src/configure.py has an installation path limit 57 characters. In order to configure, I change the limit to 100, and it configured. Is this Okay? any reason for 57?

ryao-mdanderson commented 1 year ago

Hi @riederd

Update in my today's work:

  1. I manually download IEDB_MHC_II-3.1.4.tar.gz, changed configure.py to allow a directory path length up to 100, then installed databases/mhc_i successfully

  2. changed the ownership of /rsrch3/scratch/reflib/REFLIB_data/nextneopi-1.4.0/ to be a service account, with group and others read permission for the directory tree.

  3. Because IEDB has been manually installed, created a non empty flag file databases/iedb/.iedb_install_ok.chck this is to avoid IEDB install triggered during the test run.

4, changed Tmp_dir in conf/param.config to make sure tmp directory has > 50GB

Then I ran the test case again, unfortunately, after 30 minutes run, still hit an issue to pull a singularity image

Error executing process > 'installVEPcache (installVEPcache)' Caused by: Failed to pull singularity image command: singularity pull --name depot.galaxyproject.org-singularity-ensembl-vep-110.0--pl5321h2a3209d_0.img.pulling.1696016364914 https://depot.galaxyproject.org/singularity/ensembl-vep:110.0--pl5321h2a3209d_0 > /dev/null status : 143

At this point, the test run is not successful.

Thank you for you help.

riederd commented 1 year ago

Hi,

Can you post the contents (.tar.gz) of the work dir of the failed process?

Did you retry? What happens when you run the process manually? e.g.:

cd <workdir of the failed process>
bash .command.run
riederd commented 1 year ago

Any feedback on this?

ryao-mdanderson commented 1 year ago

Hi @riederd Dietmar,

I am very sorry for a late reply. One of the reasons is I continued to hit singularity pull ensembl-vep. Not only during the nextflow run, but also with manually command pull. With this error could not be resolved, I am stuck on nextflow test run.

[ryao@myserver ~]$ singularity pull depot.galaxyproject.org-singularity-ensembl-vep-110.0--pl5321h2a3209d_0.img.pulling.1696016364914 https://depot.galaxyproject.org/singularity/ensembl-vep:110.0--pl5321h2a3209d_0

INFO: Downloading network image 629.7MiB / 991.6MiB [=================================================>----------------------------] 64 % 358.3 KiB/s 17m14s FATAL: net/http: request canceled (Client.Timeout exceeded while reading body)

The above is an example of my singularity pull. By the way, are you able to pull on your server?

For IEDB installation. I believe your application has handled this automatically, since I was able to manually installed IEDB, I was curious to get nextflow test work first. Therefore I did not retry, partially due to the reference directory in a shared directory that controlled by GPFS file system with access control. Simply change the write access for a non root user will not work.

Thank you for your help. Rong Yao

riederd commented 1 year ago

Hi, it seems that you are having some sort of (temporary?) network issues. I just tried to pull the image and it went smoothly, i took 2 minutes.

You might try to set a longer timeout for singularity image pulls, e.g. at line: https://github.com/icbi-lab/nextNEOpi/blob/fe7b21cdc0b97aae38195e5f5ac1b9851674f6b1/conf/profiles.config#L91 add the following:

singularity.pullTimeout = 3600
ryao-mdanderson commented 1 year ago

Hi @riederd,

I followed your suggestion to add singularity.pullTimeout = 3600 in conf/profile.config and rerun the nextNEOpi test data, this time it exited quickly due to pull fastqc image. I have attached a file with the full output in this message.

I also get a try to manually singularity pull ensembl-vep-110.0 again, after 17 minutes, it exited with same error as I reported.

I wonder you get to pull the image in 2 minutes, is due to physical location? I guess depot.galaxyproject.org hosted in Europe and I worked in Texas USA.

I feel it is frustrating to get it work. nextNEOpi-test.txt

Thank you , Rong Yao

riederd commented 1 year ago

I'm sorry that you are running into these issues, which apparently are caused by something not specific to nextNEOpi, since even pulling the image manually fails. Unfortunately I can not reproduce them, so it gets difficult for me to pinpoint the root cause.

Are there any limits set on the system you are using, e.g. memory, cputime

ryao-mdanderson commented 1 year ago

@riederd you have been very helpful already, thank you.

Today, I finally successfully manually pull ensembl-vep image on a HPC cluster node, it took ~30 minutes; Comparing to your pull in 2 minutes, a significant difference.

I am waiting for a firewall open for https://apps-01.i-med.ac.at-images on the cluster node. I will test the nextflow again when this is ready.

Cheers, Rong

riederd commented 1 year ago

I'm closing this now