Open andrewdavidsmith opened 6 years ago
I think you're right that we can detect it computationally, but is that true in all cases?
Bismark has an explicit --pbat
option and fails silently if you try mapping PBAT reads without it. What if we implement the nucleotide analysis function and print a warning if the composition doesn't match up with the flag?
I have another question: do we still need the AG_WILDCARD option for mapping single-end? The reads are the same for single-end PBAT.
GSE86903
This dataset has mixed WGBS and PBAT protocols. Specifically, BS-seq_H9_hEpiLC_d4_1
and BS-seq_H9_hEpiLC_d4_2
are WGBS, while others are PBAT. Should be a good one to test this.
This should be easy, based on analyzing the nucleotide composition of a collection of reads at the start of the file (e.g. 100k reads).