robmaz / distmap

Sequence alignment on Hadoop
0 stars 1 forks source link

support BAM input #49

Open robmaz opened 6 years ago

robmaz commented 6 years ago

This would be a nearly trivial change, the only problem seems to be that readtools ReadsToDistmap wants the --interleavedInput option to read paired-end BAMs? It would be more elegant if that could be autodetected, otherwise we need to add a new and cumbersome command line option to distmap.

robmaz commented 6 years ago

Or maybe this is only for fastq input?

magicDGS commented 6 years ago

@robmaz - the option --interleavedInput should be provided to decide if you wanna map the reads as a pair-end or a single-end, except for FASTQ files that are separated.

I think that autodetection in this case is a bad idea, because someone might want to process pair-end files as single end if they are interleaved. In addition, there are no rules or definition for an unmapped BAM file, so there is no clear "guess" if the file is paired or not - for example, you can look at the tags, at the read names, etc., but this might change with the provider.

Because anyway users should know if their file is pair-end or not, I guess that it is nice to have a --paired flag in distmap, to indicate also how to process paired data. This will be useful also to decide the flags of mappers like bwa, without the need of adding them to the --mapper-args option.

At least for now, there is no plan in ReadTools to guess if the input is paired or not. I may consider that in the future, but I have some high priority things on the ReadTools development before that.