zhou-lab / biscuit

BISulfite-seq CUI Toolkit
Other
63 stars 24 forks source link

alignment and markdup queries #27

Open PoisonAlien opened 5 years ago

PoisonAlien commented 5 years ago

Hi,

I have some naive questions.

  1. In biscuit align setting -b 1 aligns read1 and read2 to both strands. Is this the one recommended for non directional libraries ? I see bismark has --non_directional option, I was wondering if this option is the same.

  2. For paired end data, can we pipe the output from align to samblaster for duplicate marking ? It would save a lots of time and disk space. I see biscuit markdup considers strand orientation for duplicate marking.

Thanks.

-Anand.

PoisonAlien commented 5 years ago

Hi, Sorry for bugging, could you PLEASE let me know if this is aligner can be used for non-directional libraries ? I have used it with -b 1 option, but I observe huge conversion rate in CHH context (~1% by Bismark against ~50% by biscuit).

It would help me to design my next steps.

zwdzwd commented 5 years ago

Hi Sorry for the tardy response. -b 1 option is now changed to be directional. Which version of biscuit are you using? Are you doing paired-end sequencing? Sometimes CHH is because of just adaptor sequencing or random priming. You might want to try the QC script (which works with MultiQC) to see if you can diagnose anything.

PoisonAlien commented 5 years ago

Hi, Thanks for responding. Please accept my apologies for pushy bug reports. This is the version I am using.

Program: BISCUIT (BISulfite-seq CUI Toolkit)
Version: 0.3.8.20180515
Contact: Wanding Zhou <wanding.zhou@vai.org>

I am doing paired end data from single cells. Libraries are non-directional and I did do QC with fastp - so reads are quite clean. And the usage says -b 1 to align reads to both strands.

-b INT        For PE, read1 to parent, read2 to daughter (0, default);
                     read1 and read2 to both (1); For SE, parent (3) and
                     daughter (1); both (0, default); Def: parent (bisulfite
                     treated strand), daughter (synthesized strand)

Does this mean biscuit cant be used for non-directional libraries ?

zwdzwd commented 5 years ago

Looks like -b 1 should be enough. Another guess of mine is that most of the CHH are coming from non-optimal mapping. One thing you could try is to threshold alignment based on mapping quality, number of mismatches (NM tag) and the score (AS tag, say >80) and see if the CHH goes away. There is also a known issue of BISCUIT in excluding the random priming. Not sure if that's relevant.