Closed bounlu closed 1 year ago
Hi,
Yes, biscuit can accept SAM/BAM-compliant files from other aligners.
Cheers, Jacob
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: Ömer An @.> Sent: Wednesday, March 15, 2023 3:47:09 AM To: huishenlab/biscuit @.> Cc: Morrison, Jacob @.>; Assign @.> Subject: [External] [huishenlab/biscuit] BAM file as the input (Issue #35)
Hello,
Does the biscuit accept BAM file from other aligners as the input, such as bismark which uses bowtie2?
Thanks.
— Reply to this email directly, view it on GitHubhttps://github.com/huishenlab/biscuit/issues/35, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB3M4YEFDVBSSYCAJBWKJXTW4FXX3ANCNFSM6AAAAAAV3NVSCY. You are receiving this because you were assigned.Message ID: @.***>
CAUTION: This email was sent from outside of the organization @.). Do not click links or open attachments unless you recognize the sender and know the content is safe. If you have any questions, please contact @*.**@*.***>.
Thanks for the prompt reply.
Related to that, may I also ask which version of the bam file should be provided to BISCUIT
?
1. *.bam
2. *.deduplicated.bam
3. *.deduplicated.sorted.bam
You would want to use 3. *.deduplicated.sorted.bam
as input. You'll also want to index your sorted BAM before running biscuit
.
Note, if you marked duplicates with another tool other than Bismark (samblaster
, picard
, etc.), as long as that BAM was sorted and indexed, you could use that BAM as input to biscuit
. BISCUIT ignores duplicate marked reads by default.
I also guessed so, however there are 2 concerns:
Bismark does not mark duplicates but actually removes them (unlike picard MarkDuplicates
), so I was wondering if this would affect biscuit
in any way. For example, QC.sh
dup_report will always display "Number of duplicate reads" as 0.
By default, Bismark deduplicated BAM is not position-sorted, as the subsequent bismark_methylation_extractor
requires name-sorted BAM. Therefore, it needs to be position-sorted as a separate step to feed into biscuit
which is a heavy step.
Indexing is the easy part.
Hopefully the following addresses your concerns:
biscuit pileup
-> biscuit vcf2bed
to extract methylation or SNPs, it's okay that duplicate marked reads have been removed. The default behavior is to skip these reads, so even if you marked duplicates with picard MarkDuplicates
, they wouldn't be included in that case either. For the specific case of QC.sh
, since the duplicates have been removed, the script can't register any duplicates and you'll get the "correct" answer of 0. If you need to find the duplicate rate in your data, you'll have to retain those duplicates either with Bismark (if it allows it) or by marking duplicates with a different tool.-g
option in biscuit pileup
). I'm not sure what your specific use case is, but the biscuitBlaster
pipeline (https://huishenlab.github.io/biscuit/biscuitblaster/#version-1) will do alignment, duplicate marking, and coordinate-sorting in a one-liner so you get a BAM that's ready for input to biscuit pileup
(assuming you do the quick process of samtools index
after the one-liner).
Hello,
Does the
BISCUIT
acceptBAM
file from other aligners as the input, such asbismark
which usesbowtie2
?Thanks.