smithlabcode / walt

WALT is a read mapping program for bisulfite sequencing DNA methylation studies.
GNU General Public License v3.0
17 stars 8 forks source link

Determine whether reads are from PBAT automatically #33

Open andrewdavidsmith opened 6 years ago

andrewdavidsmith commented 6 years ago

This should be easy, based on analyzing the nucleotide composition of a collection of reads at the start of the file (e.g. 100k reads).

bdecato commented 6 years ago

I think you're right that we can detect it computationally, but is that true in all cases?

Bismark has an explicit --pbat option and fails silently if you try mapping PBAT reads without it. What if we implement the nucleotide analysis function and print a warning if the composition doesn't match up with the flag?

mengzhou commented 6 years ago

I have another question: do we still need the AG_WILDCARD option for mapping single-end? The reads are the same for single-end PBAT.

mengzhou commented 6 years ago

GSE86903 This dataset has mixed WGBS and PBAT protocols. Specifically, BS-seq_H9_hEpiLC_d4_1 and BS-seq_H9_hEpiLC_d4_2 are WGBS, while others are PBAT. Should be a good one to test this.