ruiguo-bio / replong

source code of the paper "RepLong - de novo repeat discovery from long reads"
16 stars 2 forks source link

Canu 1.4 not working when using LSF cluster so pipeline fails #11

Open DanJeffries opened 5 years ago

DanJeffries commented 5 years ago

Hi Rui,

I'm hoping you can help me with a problem I'm having with Replong, although maybe the problem is with Canu 1.4.

I am trying to run it (just on the drosophila test data first) on a cluster using an LSF queuing system.

The command I am using is:

$REPLONG -f dro_100k.fa -s 100M -t ./dmel_test/ -c false > Replong.log 2>&1

Here is the Replong.log file.

It seems to fail after the first bsub submission of the canu scripts (lines 80-90 in Replong.log).

I took a look in canu-scripts/canu.01.out and the error (at line 178) is:

unexpected EOF while looking for matching `''

which suggests a problem with quotations in the canu.01.out script

Indeed if I remove all the quotations from this script and run it manually it works. But it was generated by Canu and I find it hard to believe that no-one else came across this bug before if it were real. The strange thing is that because it fails, it seems that Canu re-submits the job to the cluster a second time (as per the canuIteration=2 parameter), but this time it looks like it works. However the Replong pipeline at this point has already failed.

So I'm a little confused as to what is happening, or what to do about it. So any thoughts from you would be very welcome!

Best wishes

Dan

ruiguo-bio commented 5 years ago

Hi Dan. I'm not familiar with lsf, and I suggest first test if canu could work.

canu -correct -p "step1" -d dmel_test genomeSize=100M corOutCoverage=400 corMinCoverage=0 minOverlapLength=500 minReadLength=1000 gnuplotTested=true stopAfter=overlap -pacbio-corrected dro_100k.fa

I'd like to know if that could finish properly.

DanJeffries commented 5 years ago

Hi Rui,

So as you suggested, I ran:

CANU=~/Software/replong/canu/Linux-amd64/bin/canu

$CANU -correct -p "step1" -d dmel_test genomeSize=100M corOutCoverage=400 corMinCoverage=0 minOverlapLength=500 minReadLength=1000 gnuplotTested=true stopAfter=overlap -pacbio-corrected dro_100k.fa canuIteration=1> Canu.log 2>&1

And it seems to have both worked and failed!

Canu.log says Finished.

But canu.01.out again shows the same error:

canu-scripts/canu.01.sh: line 31: unexpected EOF while looking for matching `''
canu-scripts/canu.01.sh: line 32: syntax error: unexpected end of file

However I just noticed that it still actually submits jobs to the grid.

JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST    JOB_NAME        SUBMIT_TIME
528370  djeffrie RUN   normal     cpt171      8*cpt171    meryl_step1[1]  Jan 25 15:31
528371  djeffrie PEND  normal     cpt171         -        canu_step1      Jan 25 15:31

(followed by the cormhap_step1[1-5] jobs which are still running).

FYI - I also ran the exact same command but using Canu v1.6 (which is installed on our cluster here) and this worked fine (i..e no EOL error in canu.01.out)

Thanks!