loneknightpy / idba

124 stars 53 forks source link

Modifying "kMaxShortSequence" or using "--long_read arg" #11

Closed sklages closed 6 years ago

sklages commented 8 years ago

Hi Yu Peng,

what is the difference between modifying "kMaxShortSequence" in src/sequence/short_sequence.h or using the parameter "--long_read arg" when using longer (PE) reads, e.g. 300bp? Or, what are "long reads" for?

I assume it is not the same, but I couldn't find info on that.

best, Sven

tallnuttcsiro commented 8 years ago

Apparently --long_read does not treat reads as paired.

If you want paired long reads, you could try this: https://groups.google.com/forum/#!topic/hku-idba/GL-1VZnhLI0

sklages commented 8 years ago

Thanks for the info.

jvollme commented 8 years ago

but what does --long_read actually do then? Does it mean you can add unpaired long-reads (sanger reads, preassembled contigs, merged read-pairs etc.) as references in addition to the required paired input?

sklages commented 8 years ago

hmm .. I still think there is a lack of documentation, see issue #8 ..

tallnuttcsiro commented 8 years ago

I agree, there is a lack of info. Most options are not properly described.

jvollme commented 8 years ago

specifically my question concerning long reads is: are they used only for scaffolding and repeat resolution or also for building the original de Bruijn graph?

In the first case, it would not make sense to use this option to add any short reads that were orphaned in preliminary quality-trimming steps. Also if the read pairs could be partially merged, you would have to to add the longer merged sequences in addition to the read pairs that generated them

In the second case any reads pairs that could be merged due to overlaps should probably only be added as merged long reads and removed from the paired end dataset, in order not to screw up the khmer and read counts. But also, in this case it would be interesting to add any orphaned short reads as well, to complement the paired end data.

For me this is quite an interesting question...

loneknightpy commented 8 years ago

@jvollme The long reads are only used for building the de Bruijn graph. It doesn't trust long reads more than short reads. Meaning that k-mers from long reads have the same weight as those from short reads during the iteration.

jvollme commented 8 years ago

thanks, thats good to know. So basically I could use the "long-read" function to include any single reads even additional single end libraries, provided I have at least one paired-end library to start with? Maybe it would be better to rename the argument to "--single-read" ?.