Closed antaralabiba97 closed 2 years ago
Hi, I'm sorry that you are running into these troubles.
In order to sort this out I'd need some more information:
.nextflow.log
file from the failed run..command.run
file from the /data/SBCS-BessantLab/Antara/nextNEOpi/work/a9/f5d9ded3bc5bc9dfbb2f33b7f58074
Thanks
Thank you for your prompt response.
I am using version nextNEOpi_v1.3.1 which I believe is the latest? I set this up on 21st of May using the latest documents and data available on GitHub. Also, I am running only one sample at the moment from the TESLA consortium data used to benchmark your pipeline. I have WES normal and tumor and tumor rnaseq data.
I have attached the files you have asked for. Just changed the extension from .run to .txt so I can attach it here.
Please let me know if you require any other info.
Thank you.
Thanks for the information. We are checking the relevant code in NEOfuse and try to understand why you get this error. Can you help us meanwhile with the following:
/data/SBCS-BessantLab/Antara/nextNEOpi/work/a9/f5d9ded3bc5bc9dfbb2f33b7f58074/sample1/LOGS/sample1_8_MHCFlurry.log
/data/SBCS-BessantLab/Antara/nextNEOpi/work/a9/f5d9ded3bc5bc9dfbb2f33b7f58074/sample1/NeoFuse/tmp/MHC_I/sample1_8_NEK11_ALDH1L1_1_8.tsv
exists-resume
) with a smaller number of CPUs in NEOfuse (e.g. 8)As you can see from the image the sample1_8_NEK11_ALDH1L1_1_8.tsv does not exist.
Also, just changed CPU parameters to 8 so will try and resume and let you know what happens.
Thank you.
Thanks! It seems that mhcflurry is not running or is being stopped/killed just shortly after it starts up.
Can you try to run it manually, e.g.:
$ singularity exec --no-home /data/SBCS-BessantLab/Antara/nextNEOpi/work/singularity/apps-01.i-med.ac.at-images-singularity-NeoFuse_dev_0d1d4169.sif /bin/bash
singularity> mhcflurry-predict --affinity-only --alleles A*02:01 --peptides TPDPGAEV --out /tmp/test_1.txt --models /home/neofuse/.local/share/mhcflurry/4/2.0.0/models_class1_pan/models.combined
[...]
singularity> cat /tmp/test_1.txt
Hi, I have run the above and have attached what the output looks like. Please let me know the next step. Thank you again! Really appreciate the help
Ok, it seems that in principle mhcflurry works. Let's see if your run with less CPUs completes.
Hi,
So I was having issues with the jobs I had submitted to run on the cluster so decided to kill the previous runs and start fresh with the amended CPU 8 defined for NeoFuse. I set up the directory as I did before but noticed this time the link to the resources file on your GitHub is unreachable? So I used the resources folder I had already created. However, when I run the pipeline again there is no route to the host to pull the singularity image ...(attached screenshot). Added the nextflow log too.
I understand this is a different issue and if you would rather me post on a new thread please let me know! Very keen to get this pipeline working and looking forward to hopefully doing this soon!
Thank you!
Hi, unfortunately we had an electricity issue last night which affects the server on which the resource is located. The bad thing is that there is a holiday and long weekend now so it might take until Monday to get this fixed since not all things involved are in our hands. We are sorry for this.
The resource download should work again.
Ah great, will test the new run shortly, thank you. Will let you know if I encounter the same problems with MHCflurry (hopefully not!)
Hello,
So I tried running again with changing the CPUs to 8 for NeoFuse but unfortunately I encountered the same error as before. I have added the .nextflow.log , sample1_8_MHCFlurry.log and the sample1_MHCI_final.log. The sample1_8_NEK11_ALDH1L1_1_8.tsv does not exist in the location specficed in the error. Also, tried the singularity command again which you posted above and got the same output.
Not sure what is causing the issue with this missing file but please do let me know on ways to get this sorted.
Thank you!
command.run.txt sample1_8_MHCFlurry.log sample1_MHCI_final.log nextflow.log
Hello,
I was just wondering if you have had a chance to take a look at this error? Appreciate you may be busy but please let me know if there is any solution when you get the time! Thank you :)
Hi, we were out of office the last days. We continue to look into this and keep you updated. Meanwhile, can you try to do the following:
$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/30/21453b69c882f412b58dee6149538c
$ bash .command.run.sh
Hi, no worries and thank you.
Also, tried this but there is no ".command.run.sh" file. Below are the files available in the directory you specified.
sorry I meant .command.run
Hi, I ran the above and this is the error output...
Hi, it seems you are running into a resource limit. Can you post the output of:
$ ulimit -a
Can you do this on the head node of your cluster and on one of the compute nodes, in case you are running nextNEOpi on a cluster.
Hello, I am just running on the head node and not submitting a job on the cluster, this is the output on the head node.
hmmm, this is stange, I do not see a big difference to our settings here. These two settings differ:
pending signals (-i) 12383285
max locked memory (kbytes, -l) unlimited
But I don't think this is the problem.
What puzzles my also is that the featureCounts
step takes so long in your case, more than 4 hrs. I would expect 10-20 min, as we see here in our environment.
Just to make sure if the Resource temporarily unavailable
issue is really temporary can you please try once more to run
$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/30/21453b69c882f412b58dee6149538c
$ bash .command.run
Thanks
Hi, I ran again and now get the error below
So you really seem to hit some resource limit on your machine. Do you have many other processes running on that machine? Can you check with:
$ ps -eLf | wc -l
$ ps -eLf | grep hfy006 | wc -l
You can try to raise some limits:
$ ulimit -n 4096
$ ulimit -l unlimited
$ ulimit -u 8192
and then run the .command.run
script again.
I am unable to change "ulimit -l" to unlimited as it is locked however, I have changed the other two parameters and will re-run. The only other processes I had running were the mhcflurry-predict which did not fully terminate after the previous run exited.
Will keep you updated. Thank you!
After doing the above running the ".command.run script" completed! I had to kill the previous processes which were still running from the nextflow run that exited with the MHCflurry error. Please let me know on how to proceed from the stage the pipeline exited. Thank you for all your help so far, glad to be one step closer!
That's good! Now I suggest to do the following:
create a file in the nextNEOpi bin/
directory named set_limits.sh
with the following content:
ulimit -n 4096
ulimit -u 8192
Edit the conf/process.config
file in the nexNEOpi directory and look for:
withName:Neofuse {
container = 'https://apps-01.i-med.ac.at/images/singularity/NeoFuse_dev_0d1d4169.sif'
cpus = 10
}
change it to:
withName:Neofuse {
beforeScript = 'source /data/SBCS-BessantLab/Antara/nextNEOpi/bin/set_limits.sh'
container = 'https://apps-01.i-med.ac.at/images/singularity/NeoFuse_dev_0d1d4169.sif'
cpus = 10
}
rerun the pipeline with the -resume
option set.
Hi,
So I did all the above and the Neofuse part of the run completed and I have the output folder for this in my results! Thank for the help on this part, it's really appreciated!
However, I now have an error during the pVACseq stage which is causing the process to exit. I have attached the files associated with the error.
nextflow.log command.run.txt command.sh.txt
Feels like I'm nearly there so I am very excited for the run to complete and then hopefully once I have a working pipeline I will be able to run my other samples!
Hi, can you also try to reduce the number of CPUs to 10 for pVACseq?
For a manual test you may do this by editing command.sh
in /data/SBCS-BessantLab/Antara/nextNEOpi/work/09/7305b172968e7a9bc25e0b59f2eb8a
and set the threads parameter from -t 40
to -t 10
.
After this you may run:
$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/09/7305b172968e7a9bc25e0b59f2eb8a
$ bash .command.run
if this works, you may change the cpus
in conf/process.config
Hi, I tried doing the above, and the process aborts
can you try to do the following:
$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/09/7305b172968e7a9bc25e0b59f2eb8a
$ rm -rf ./MHC_Class*
$ bash .command.run
The process exits with "Error: No command specified".
Hmmm, I think you hit an issue in pVACseq, which might be solved in the newest version. I'll prepare an updated image this evening. Meanwhile, can you send me a tar archive from that working directory, so that I can test locally. You would need to create it as follows:
$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/09
$ tar -chvzf testdata.tar.gz 7305b172968e7a9bc25e0b59f2eb8a
Please send me a private e-mail with a download link for the resulting testdata.tar.gz
.
Sent the email, please let me know if you do not receive it. Thank you.
One more thing to try:
$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/09/7305b172968e7a9bc25e0b59f2eb8a
$ rm -rf ./MHC_Class*
$ singularity exec --no-mount hostfs -B /data/SBCS-BessantLab/Antara/nextNEOpi -B "$PWD" --no-home -B /data/SBCS-BessantLab/Antara/nextNEOpi/assets -B /data/SBCS-BessantLab/Antara/nextNEOpi/tmpDir -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/iedb:/opt/iedb -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/mhcflurry_data:/opt/mhcflurry_data /data/SBCS-BessantLab/Antara/nextNEOpi/work/singularity/apps-01.i-med.ac.at-images-singularity-pVACtools_3.0.0_icbi_5dfca363.sif /bin/bash
Singularity> bash .command.sh
...and in case you get an netMHCstab error please look for the following line in .command.sh
:
--netmhc-stab
and remove it and re-run the commands above.
NetMHCstab is run from a webservice which is not always working as expected. It can be disabled in nextNEOpi with the option --use_NetMHCstab false
I didn't get the netMHCstab error but the same error as before "Error: No command specified".
Did you get pandas warnings with this?
Yes I did, the same as before
This is interesting, you should not get those. Can you check the pandas version and path for me:
$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/09/7305b172968e7a9bc25e0b59f2eb8a
$ rm -rf ./MHC_Class*
$ singularity exec --no-mount hostfs -B /data/SBCS-BessantLab/Antara/nextNEOpi -B "$PWD" --no-home -B /data/SBCS-BessantLab/Antara/nextNEOpi/assets -B /data/SBCS-BessantLab/Antara/nextNEOpi/tmpDir -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/iedb:/opt/iedb -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/mhcflurry_data:/opt/mhcflurry_data /data/SBCS-BessantLab/Antara/nextNEOpi/work/singularity/apps-01.i-med.ac.at-images-singularity-pVACtools_3.0.0_icbi_5dfca363.sif /bin/bash
Singularity> pip show pandas
and
Singularity> python
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> print(pd.__version__)
Can you then try the new test image that I prepared:
$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/09/7305b172968e7a9bc25e0b59f2eb8a
$ rm -rf ./MHC_Class*
$ singularity exec --no-mount hostfs -B /data/SBCS-BessantLab/Antara/nextNEOpi -B "$PWD" --no-home -B /data/SBCS-BessantLab/Antara/nextNEOpi/assets -B /data/SBCS-BessantLab/Antara/nextNEOpi/tmpDir -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/iedb:/opt/iedb -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/mhcflurry_data:/opt/mhcflurry_data https://apps-01.i-med.ac.at/images/singularity/pVACtools_3.0.1_icbi_test_20220609.sif /bin/bash
Singularity> pip show pandas
and then:
Singularity> bash .command.sh
Did all the steps to check python and pandas version which are the same as yours.
Tried the new test image and it worked!
When I go and look in the folder with the outputs I don't see the "sample1_tumor.filtered.tsv" file but just the filtered results file for HLA-A02:01 "sample1_tumor_HLA-A02:01.filtered.tsv".
For now, I have not included "HLA-HD" in the pipeline so no MHC-II predictions are generated but once this run finishes completely I will go back to include it.
Cool. Thanks!
The final filtered result for the entire sample is generated by nextNEOpi after collecting the parallelized junks. So what you see is expected.
I did not disclose my pandas version ;-) so I don't think you can state that it is the same as yours. Would it be possible for you to post the output of:
$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/09/7305b172968e7a9bc25e0b59f2eb8a
$ singularity exec --no-mount hostfs -B /data/SBCS-BessantLab/Antara/nextNEOpi -B "$PWD" --no-home -B /data/SBCS-BessantLab/Antara/nextNEOpi/assets -B /data/SBCS-BessantLab/Antara/nextNEOpi/tmpDir -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/iedb:/opt/iedb -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/mhcflurry_data:/opt/mhcflurry_data /data/SBCS-BessantLab/Antara/nextNEOpi/work/singularity/apps-01.i-med.ac.at-images-singularity-pVACtools_3.0.0_icbi_5dfca363.sif /bin/bash
Singularity> pip show pandas
Haha you're right I mean just the python version!
Here's the output of the above:
Thanks!
This is very interesting, python from the singularity image is using the pandas package that is installed in your home directory, which is not working with pVACseq in the image. In principle you should not see anything from your home directory from within the container since we use the --no-home and --no-mount hostfs
options to startup the container. This is working fine here, e.g. see what happens if I want to change to my home dir from within the container:
Singularity> cd ~
bash: cd: /home/rieder: No such file or directory
May I ask which version of singularity you use?
Hmm, yes I understand just had a read around this. This is the version:
Thanks, I'll try to reproduce this
I think I get a clue what happens a your site. Can you please post the output of:
grep "bind path" /etc/singularity/singularity.conf
Here is the output for the above command:
yes, here we go
bind path = /data
tells singularity to bind mount /data
from the host to /data
in the container. Now, your user home $HOME
is located in /data
i.e. /data/home/hfy006
this way, no matter if we tell singularity to not mount the user home (--no-home
), it will still be present in the container because it gets mounted by default via the explicit bind path = /data
directive in the global config.
When importing a library, Python is first looking in the user home under $HOME/.local/lib/...
for a matching package and if finds one it will use it. Now, if the package has an incompatible version you will get warnings or errors or any sort of unexpected behavior.
So the quickest fix is to remove the bind path = /data
from /etc/singularity/singularity.conf
, since you will be likely to hit these package/library conflicts also with other singularity containers, it can happen not only with python packages but also - for example - with R libraries. However, I'm not sure if this is something you/your admin are/is concerned about and there maybe some important reasons why you/your admin set this configuration as it is.
I need to check if there is any other way to avoid this situation. Since this is not a specific nextNEOpi
bug I'll close the issue for now, but feel free to reopen it.
Thanks a lot for all your input!
Ok, I understand the issue now.
For now, I am running nextNEOpi on the cluster so I will ask the admin team to see if we can work around this. I do have my own custom-built PC arriving soon which is designed to run pipelines like nextNEOpi locally without memory or performance problems so may be able to avoid the issue above.
I will get back to you once I am able to work off the cluster and hopefully be able to run the pipeline smoothly! Thanks for all your help thus far :)
One thing, that may work would be to set a "fake home" in the params.conf
which points to the tmpDir
e.g.
singularity {
enabled = true
autoMounts = true
runOptions = "--no-home" + " -H " + params.singularityTmpMount + " -B " + params.singularityAssetsMount + " -B " + params.singularityTmpMount + " -B " + params.resourcesBaseDir + params.singularityHLAHDmount + " -B " + params.databases.IEDB_dir + ":/opt/iedb" + " -B " + params.databases.MHCFLURRY_dir + ":/opt/mhcflurry_data"
}
Might work, but this is untested, so I have no idea if other problems pop up with is hack.
Ok, I will try and hope for the best 😅
Will let you know what happens.
Hi,
I came across your pipeline not long ago and found it would be great for my research so I am really keen to make sure I can get it running.
I set the pipeline up and did not input the HLA-HD file as I wanted to first make sure I could get the pipeline running to predict MHCI neoepitopes first. The only thing I did change was in the process.conf file where I changed the CPU usage for all processes to 40.
I have attached the HTML file detailing the error which stopped the pipeline running at the NeoFuse stage. I went and had a look at the "sample1_MHCI_final.log" file as detailed in the error and the ".command.sh" script file in the working directory stated in the pdf file of the error which I have attached. In the pdf file the first line of the error has been omitted when I converted from HTML to pdf but it stated "Error executing process > 'Neofuse (sample1)".
I was happy that it was running smoothly and the prepoccessing steps had been completed successfully however, unfortunaley an error arised. I am actually not sure how to resolve the error stated and would really appreciate any help.
Thank you in advance!
Nextflow Workflow Report.pdf sample1_MHCI_final.log