ruiguo-bio / replong

source code of the paper "RepLong - de novo repeat discovery from long reads"
16 stars 2 forks source link

Meaning of -c option backward? #4

Closed rsharris closed 6 years ago

rsharris commented 6 years ago

I didn't use -c on my replong command line.

The output tells me "correction=false" Then later says "Use corrected reads"

Is that the correct behavior? Or should it correct reads when $cor is true, and use raw reads when $cor is false?

rsharris commented 6 years ago

Also not clear whether the example "drosophila test file of 100k reads" contains raw reads or corrected reads. If I use this file for testing, should I use the -c option or not?

bzvew commented 6 years ago

Hi, I'm sorry for that misunderstanding. If you use corrected reads, you do not correct the read by canu again. The default behavior is cor=false. If you set -c true, canu will correct the reads (raw or corrected reads input) by correctReads.sh. So when "correction=false", replong will output "Use corrected reads", because replong assumes the reads are corrected before. If "correction=true", replong will output "Use raw reads", and correctReads.sh will be used in the process.

The reads human_100k.fa are corrected reads and dro_100k.fa are raw reads. However, -c can be set to true anyway to have better read quality.

rsharris commented 6 years ago

I see. I misinterpreted it as indicating I was giving it corrected reads, rather than indicating that the correction step is needed.

It might be clearer to if line 219 reported "correcting raw reads".

Regarding correctReads.sh, I don't see any script by that name in replong. Perhaps it is part of canu? But in any case I don't see it called directly from replong.sh. Regardless of whether cor is true or cor is false, canu -correct is run, albeit with a couple different parameters (for some reason the param order has been shuffled between lines 220 and 227, making it look like there are more differences than there really are).

bzvew commented 6 years ago

Yes, correctReads.sh is in Canu. "canu -correct" means canu will stop after correction. However, it can stop after calculating overlap by "stopAfter=overlap", so for the corrected reads the real correction step will not be used.