karel-brinda / ococo

Ococo: the first online variant and consensus caller. Call genomic consensus directly from an unsorted SAM/BAM stream.
https://arxiv.org/abs/1712.01146
MIT License
47 stars 3 forks source link

consensus file has header of reference and not name bam/sample file #38

Closed MarinaSci closed 1 year ago

MarinaSci commented 1 year ago

Hello - great development and thank you, very useful! Not sure how timely my comment can be and how active this section is... However, I will try! One thought I had is, when you have multiple bam files (=multiple samples) you want to extract the same consensus reference from (for subsequent phylogenetic analysis etc), then it would be best if the ococo output file had the sample or bam name on the first line after '>', as opposed to the fasta reference it came from. I hope that makes sense... Would that be a quick fix you think?

Thank you!!

featurerequest

karel-brinda commented 1 year ago

Hi Marina,

thanks for your comment and the suggestion. To propose a specific solution, I need to double-check whether I understand everything correctly.

Are you proposing that eg in the case you had a ref file with sequences chr1 and chr2 and a BAM file from a sample called smp, you would like to rename the seqs from chr1 to smp.1 and chr2 smp.2, in order to simplify the subsequent analysis?

MarinaSci commented 1 year ago

Dear Karel,

Thank you very much for getting back to me so swiftly and for taking on my recommendation.. Probably to rename the seqs from chr1 to smp.1 and chr2 smp.1.

I work with environmental/faecal samples and can have multiple infections present in a sample. In my references I have multiple genomes (nuclear or mitogenomes); let's say multiple chrs. So for a given sample that has more than 1 parasites present, it would be fantastic to get chr1 to smp.1 and chr2 smp.1.

Does it make sense? Again, very grateful for even considering such a tool!

Best regards, Marina

On Thu, 9 Feb 2023 at 00:02, Karel Břinda @.***> wrote:

Hi Marina,

thanks for your comment and the suggestion. To propose a specific solution, I need to double-check whether I understand everything correctly.

Are you proposing that eg in the case you had a ref file with sequences chr1 and chr2 and a BAM file from a sample called smp, you would like to rename the seqs from chr1 to smp.1 and chr2 smp.2, in order to simplify the subsequent analysis?

— Reply to this email directly, view it on GitHub https://github.com/karel-brinda/ococo/issues/38#issuecomment-1423396183, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUSLHWCGF4I5NXHPNDJHXT3WWQXX3ANCNFSM6AAAAAAUMJMAOI . You are receiving this because you authored the thread.Message ID: @.***>

--

Best wishes,

Marina

Marina Papaiakovou, PhD candidate

Harding Distinguished Postgraduate Scholar

Department of Veterinary Medicine

University of Cambridge, Cambridge, UK

(she/her)

--

*People have different working patterns; please don’t feel obliged to act on this email outside of your own normal working hours *

karel-brinda commented 1 year ago

In this case, the most straightforward solution would be to post-process the outputs from Ococo.

Unfortunately, it seems that the -F parameter is unable to redirect the FASTA output to the standard output (stdout) (I have no idea why I didn't implement this – I probably focused mainly on the VCF output).

So the way to go is:

  1. First storing the FASTA onto disk, eg ./ococo -i test.bam -f test.fa -x ococo64 -F output.fa
  2. Converting the FASTA to a modified version with new seq names, eg seqtk seq output.fa | perl -pe 's/>chr/>smp./g' or seqtk seq output.fa | perl -pe 's/>/>smp1./g' (depends on how exactly you want to name the sequences)
MarinaSci commented 1 year ago

Thank you for the guidance, Karel!! Very helpful. Best wishes, Marina

On Wed, 15 Feb 2023 at 00:45, Karel Břinda @.***> wrote:

In this case, the most straightforward solution would be to post-process the outputs from Ococo.

Unfortunately, it seems that the -F parameter is unable to redirect the FASTA to the standard output (I have no idea why I didn't implement this – I probably focused mainly on the VCF output).

So the way to go is:

  1. First storing the FASTA onto disk, eg ./ococo -i test.bam -f test.fa -x ococo64 -F output.fa
  2. Converting the fasta, eg seqtk seq output.fa | perl -pe 's/>chr/>smp./g'

— Reply to this email directly, view it on GitHub https://github.com/karel-brinda/ococo/issues/38#issuecomment-1430591184, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUSLHWGMUSYYTBUFXU3O3HDWXQRJ5ANCNFSM6AAAAAAUMJMAOI . You are receiving this because you authored the thread.Message ID: @.***>

--

Best wishes,

Marina

Marina Papaiakovou, PhD candidate

Harding Distinguished Postgraduate Scholar

Department of Veterinary Medicine

University of Cambridge, Cambridge, UK

(she/her)

--

*People have different working patterns; please don’t feel obliged to act on this email outside of your own normal working hours *

karel-brinda commented 1 year ago

You are welcome!

I'll close this ticket for now as this won't be implemented as a separate feature.

I've also made a ticket for future about the possible redirection of consensus to stdout #39.