Closed Malabady closed 5 years ago
You shouldn't use nodes=2, canu processes run on a single node, it will just request more than one of those instances at the same time so use node=1.
The output you posted is using useGrid=true, correct? That is the expected output, the way canu runs (see https://canu.readthedocs.io/en/latest/tutorial.html#execution-configuration) is to submit processes to the grid and then itself to wait for those to complete and resume execution. So this is fine, the job should be in your queue (1373451 and 1373452 waiting for the previous one to complete). If those jobs aren't being scheduled, that's an issue with your grid not Canu, you'd have to find out why they aren't being scheduled since they are requested a low amount of memory and CPU.
Thank you for the clarification. this is really helpful. The original canu command that I invoked interactively we done (see below) and the two child jobs were Held and Queued, although the resources are available. So, I didn't know if the original command was "done" because the child jobs were not submitted or the other way around. I think i will rerun it again and watch closely.
[1]+ Done nohup canu -p run -d rosea2 genomeSize=3.6g -pacbio-raw ../raw_data/XMAGA.20190628.PACBIO_DATA.PART-//Rosea_1///*.subreads.fastq.gz corOutCoverage=200 correctedErrorRate=0.05 "batOptions=-dg 3 -db 3 -dr 1 -ca 500 -cp 50" useGrid=true gridEngine=pbs gridEngineThreadsOption="-l nodes=2:ppn=THREADS" gridEngineMemoryOption="-l mem=MEMORY" gridOptions="-q ggbc_q -l walltime=14:00:00:00" java=/usr/local/apps/eb/Java/1.8.0_144/bin/java
Held means the job is waiting for the Queued one to complete. Queued is just waiting for resources, it's up to your grid to decide when/how to run it. You should check with your IT if you need to specify any additional information in your submit command to make them run. If not you can check why the job is still in the Queue and not being run.
Hello Sergey:
We are having some difficulties getting canu job to dispatch on more than one node using the Grid options. I am assembling a large genome with over 100X coverage of Sequel II data.
When I run canu on one node (28cores and 500Gb) with the "useGrid=False", the run starts but it dies before finishing the correction stage with no obvious (to me) error message. So, I assumed it might be resources issue. Here is the code:
I tired to submit the run to two nodes (28cors and 512GB per node) with enabling the grid option, "useGrid=true", but the run never dispatched even though the target nodes are available.
I added more grid related options to the command line and submitted it to the queue again, but it never started even though the target nodes are available. here is the code:
Finally I started an interactive session on the cluster on two nodes (28cores and 500Gb each) and started the above script interactively. It start working but finishes in couple hours before even finishing the correction stage. No clear error message. Here is the report:
I really appreciate it if you can point out what we are doing wrong here.
Many thanks, Magdy