tobiasrausch / alfred

BAM Statistics, Feature Counting and Annotation
https://www.gear-genomics.com/alfred/
BSD 3-Clause "New" or "Revised" License
144 stars 12 forks source link

Alfred consensus: using -p #106

Closed andreas-wilm closed 1 year ago

andreas-wilm commented 1 year ago

Hi Tobias,

Thanks for Alfred and thanks for making it public! I'm trying to use Alfred to compute a consensus from comparatively short ONT (duplex) reads against short references (200nt in one, 400nt in another case), where samtools consensus misses some indels. I would like to compute the consensus for the entire length of the alignment. How am I supposed to use -p in this setting? I noticed that if I send this to 1 or reflen, I often don't get a consensus, because there are no overlapping reads/bases at the extreme ends. If I use something like -p sq:200 (for the 400nt reference) I do get an excellent consensus for seemingly the entire length, which is what I want. Does Alfred in this case only look at reads that overlap with pos 200? Should I use the setting differently?

Many thanks, Andreas

tobiasrausch commented 1 year ago

Hi Andreas,

Yes, -p defines the position that reads are required to overlap because alfred computes a classical multiple sequence alignment at this position. With -w you can define a window around -p that reads need to span.

Best, Tobias

andreas-wilm commented 1 year ago

Ah I see. Thank you. I've overlooked -w. So if I understand correctly, then using -p x -w y, would only consider reads that overlap all positions from x-y to x+y, right? It would compute a consensus along the entire length of extracted reads though, not just on p. At least that's what I think I see.

Is there an efficient way to compute the consensus on all positions with Alfred? I realize mine is a special case, given the short reference. Normally this would be total overkill

Andreas

tobiasrausch commented 1 year ago

Is there an efficient way to compute the consensus on all positions with Alfred?

No, I don't have anything ready for this. For larger regions, not all reads overlap and then it's basically an assembly problem.

andreas-wilm commented 1 year ago

Yeah makes sense. Thank you!