replikation / poreCov

SARS-CoV-2 workflow for nanopore sequence data
https://case-group.github.io/
GNU General Public License v3.0
39 stars 17 forks source link

Untrimmed BAM files #169

Closed premanand17 closed 2 years ago

premanand17 commented 2 years ago

Our lab is doing the in-house sequencing now with ONT and we have to report our results on a routine basis.

One of the requirements is that we have to submit the untrimmed BAM file as noted here, "Reads must not be primer trimmed (eg. if the ARTIC pipeline was used, the untrimmed BAM must be provided)".

However, what we could see in the 2. Genomes directory is only the trimmed bam file.

barcode13_amplicon_coverage_log.png barcode13_amplicon_coverage.png barcode13.consensus.fasta barcode13_mapped_MN908947.3.primertrimmed.sorted.bam barcode13_mapped_MN908947.3.primertrimmed.sorted.bam.bai barcode13_seq_ident_check.tsv

Wondering if there is any option in the poreCov pipeline to get the untrimmed BAM file please.

Any pointers would be much appreciated.

Best Regards Prem

premanand17 commented 2 years ago

HI Christian,

Just to be clear on it, by untrimmed I mean (BAM file with the primers left on) as shown in the Artic Protocol (the naming is bit confusing actually).

https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html

Screenshot 2021-11-19 at 12 52 27

Thanks Prem

replikation commented 2 years ago

@premanand17 this would be the samplename.trimmed.bam then?

hoelzer commented 2 years ago

I think we could output them optionally? Otherwise, you would find them in the work directory @premanand17 . Please try

ls -lah work/*/*/*.sorted.bam

and I think the BAMs you are looking for are the ones just called <SAMPLENAME>.sorted.bam?

premanand17 commented 2 years ago

Thank you both.

Yes, could find it in the work directory. It would be great if you could output them optionally as there might be other users with similar requirements. It would be good to have them in the Genomes directory.

Regarding the naming of the file, not sure why artic is calling it as ".trimmed". To avoid the confusion, better to call it as ".primeruntrimmed.bam" or "untrimmed.bam".

hoelzer commented 2 years ago

But then we should double-check that these are really untrimmed. There are several output files.

premanand17 commented 2 years ago

Agree. Also if the file has unmapped reads included?

replikation commented 2 years ago

@premanand17 can you please double-check which bam file it is? ill create then a branch with this as output and if everything is in order ill add it to the poreCov main branch

premanand17 commented 2 years ago

Sure, we are running the pipeline over a fresh set of samples now. Will let you know once it is done and we have the results.

premanand17 commented 2 years ago

Hi @replikation , @hoelzer , we think the file ending with trimmed.rg.sorted.bam in the work directory is the one we are looking for.

work/nextflow-poreCov-ec2-user/4e/74e199de7ac6d02c9d76b625303f35/barcode25.trimmed.rg.sorted.bam

replikation commented 2 years ago

@premanand17

hi you can test out the changes via:

# update poreCov
nextflow pull replikation/poreCov

# run this specific poreCov version via:
nextflow run replikation/poreCov -r trimbam --help
premanand17 commented 2 years ago

Thanks for this @replikation . Sure, we will rerun and let you know..

premanand17 commented 2 years ago

Hi @replikation @hoelzer . We checked and all looks good. There's a trimmed bam in the 2.Genomes as well as a primertrimmed one, and it looks to be identical to the one we pulled out of the work directory when using v0.10.0.

Thanks for all your help.

replikation commented 2 years ago

alright, we do some testing on our end at some point before putting it in the release. but for now you can just use this branch via the -r trimbam until it's in a full release.

I am closing this