Closed ShuChen1986 closed 5 years ago
Quick answer: I'm guessing you didn't rename 0803.01.meryl.WORKING to 0803.01.meryl as the last line of the script does.
Longer answer:
In general, you need to run the scripts as-is, no cheating and running the commands by hand. The scripts will get much more complicated.
'Killed' usually indicates the job was killed for exceeding a memory limit imposed by the grid. The original command should have used no more than 17GB memory. How much did you request for the job? How much did canu request (in the JobSubmit script)?
The second time you ran it, you told the command itself to use up to 120GB memory (and more threads), but however you ran this command, it wasn't killed for exceeding memory limits.
Hi, Brian, Thank you very much for your reply. I learnt from the documentation that canu will automatically detect and compute the resources, so I did not set up the number of threads and memory in the canu command. When I qsub the job, I did not set up memory size, the maximum memory for each node is 125gb.
#PBS -N canu0803
#PBS -l nodes=1:ppn=28
#PBS -q high
Should I set up the request for threads and memory at the canu command and run it from the start?
Yes, it should all be automagic. The JobSubmit scripts will request memory and thread resources via command line options. Do those look appropriate for your grid? If not, you can change them with the gridEngineResourceOption option. The Default is:
gridEngineResourceOption="-l nodes=1:ppn=THREADS:mem=MEMORY"
Or set options applied to all grid jobs with gridOptions. You probably need to set:
gridOptions="-q high"
Finally, and very Important, upgrade to the almost-ready-to-be-released v1.9. This has some extremely important fixes for PBS.
> git clone https://github.com/marbl/canu.git
> cd canu/src
> git checkout v1.9
> make -j 8
Since the supercomputer I am using won't allow me to connect to internet, I will have to make v1.9 in my local computer and upload it to the supercomputer. Hopefully, it will work. The new canu -version feedback as "Canu snapshot v1.8 +299 changes (r9509 8e0c3e911f1af984f0153550eb0faea2379ffa36)" instead of v1.9. Did I get the right version?
That's the correct version. The number doesn't change until I actually make the release. :-(
If you have trouble compiling, you can just tar up the canu source code directory (canu/ in my example), upload that, and compile on the remote machine. Once you've done 'git clone', internet access isn't necessary.
That's the correct version. The number doesn't change until I actually make the release. :-(
If you have trouble compiling, you can just tar up the canu source code directory (canu/ in my example), upload that, and compile on the remote machine. Once you've done 'git clone', internet access isn't necessary.
I've tried to use v1.9, but it needs glibc2.14+, and I do not have the root authority of the server I am using. I've tried to make install glibc2.14+ in directories other than root, but it failed me with core dump. Luckily now the v1.8 is working with useGrid=false in a acceptable speed. Strangely, I saw useGrid=false was rejected while submitting to the grid, but it is now finishing up mhap step. I have a different question here though, I am assembling a plant genome of 1.2Gb, I have 30Gb Pacbio RSII reads, and 140Gb Nanopore reads, would you recommend to correct the all the reads together or correct the Pacbio and Nanopore reads separately.
There is no specific need for a glibc within Canu, the compiler/OS will determine that, if you're getting glibc errors it likely means you have a different version of the environment where you compile and where you run the code. You could try building on your local machine in a virtual environment with an old OS to avoid this (see https://pmelsted.wordpress.com/2015/10/14/building-binaries-for-bioinformatics/) but since you got 1.8 to work then you can not worry about this.
We typically correct all the reads together, given how much coverage you have of nanopore vs pacbio you could just use the nanopore alone.
There is no specific need for a glibc within Canu, the compiler/OS will determine that, if you're getting glibc errors it likely means you have a different version of the environment where you compile and where you run the code. You could try building on your local machine in a virtual environment with an old OS to avoid this (see https://pmelsted.wordpress.com/2015/10/14/building-binaries-for-bioinformatics/) but since you got 1.8 to work then you can not worry about this.
We typically correct all the reads together, given how much coverage you have of nanopore vs pacbio you could just use the nanopore alone.
Thank you for your reply. Indeed, after I compiled v1.9 in the server (with a make: warning: Clock skew detected. Your build may be incomplete), the glibc error went away. But when I submitted the job without useGrid=false, it still gave the error info as following:
CRASH: Canu snapshot v1.8 +299 changes (r9509 8e0c3e911f1af984f0153550eb0faea2379ffa36)
CRASH: Please panic, this is abnormal.
ABORT:
CRASH: Failed to submit compute jobs.
CRASH:
CRASH: Failed at /public/home/test3/app/canu-1.9/Linux-amd64/bin/../lib/site_perl/canu/Execution.pm line 1241.
CRASH: canu::Execution::submitOrRunParallelJob("ecoli", "meryl", "correction/0-mercounts", "meryl-count", 1) called at /public/home/test3/app/canu-1.9/Linux-amd64/bin/../lib/site_perl/canu/Meryl.pm line 828
CRASH: canu::Meryl::merylCountCheck("ecoli", "cor") called at /public/home/test3/app/canu-1.9/Linux-amd64/bin/canu line 859
CRASH:
CRASH: Last 50 lines of the relevant log file (correction/0-mercounts/meryl-count.jobSubmit-01.out):
CRASH:
CRASH: qsub: submit error (Bad UID for job execution MSG=ruserok failed validating test3/test3 from node69)
Also, when I submitted jobs using useGrid=false, the "exec_host" will still show node number, does this mean the job is still executed on a node not the local machine? Or as long as the job was run on one same node, useGrid=false will work fine?
That error indicates your nodes aren't allowed to submit jobs themselves, this is a requirement for running Canu on the grid (see the FAQ). useGrid=false is the suggested workaround for grids that don't support this feature. See issue #104 for information on how to configure the Torque server to fix the error, assuming your admin allows this.
useGrid=false is meant to be run on the head node, not sure if it will work correctly on a compute node. It won't do heavy compute though and just update bookkeeping, check for job success, etc.
Hi, I was running canu with useGrid=remote as following
It stopped at running meryl-count.sh, so I ran meryl-count.sh manually, and then it stopped with the following feedback.
/public/home/test3/scsio/Nanopore/canu0803/correction/0-mercounts/meryl-count.sh: line 105: 146199 Killed /public/home/test3/app/canu-1.8/Linux-amd64/bin/meryl k=16 threads=7 memory=17 count segment=$jobid/01 ../../0803.seqStore output ./0803.$jobid.meryl.WORKING
Then, I modified the above command and ran as following:
/public/home/test3/app/canu-1.8/Linux-amd64/bin/meryl k=16 threads=28 memory=120 count segment=01/01 ../../0803.seqStore output ./0803.01.meryl.WORKING
It finished with no erro info and I did not know what to run next.
So I ran the first canu commandline again, and it gave me the following feedback.
Could you please give me some suggestions on this? Thank you so much!