uec / Issue.Tracker

Automatically exported from code.google.com/p/usc-epigenome-center
0 stars 0 forks source link

Re-run the BiS-SNP for Esteller data (with changing the BSMAP parameter) #548

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Zack,

The number of mapped reads for Esteller data 
(~huydinh/huydinh_research/Data/External/Esteller) are ridiculously small. 
Then, I come back to their paper's method section and found out that they used 
are non-directional (The bisulfite conversion followed by a PCR step generates 
four different strands; the original “+” and “−” strands along with 
their complements., quoted from 
http://www.pnas.org/content/suppl/2012/06/06/1120658109.DCSupplemental/pnas.2011
20658SI.pdf#nameddest=STXT) .

Hence, please re-run the pipeline with the parameter -n 1

-n  [0,1]   set mapping strand information:
            -n 0: only map to 2 forward strands, i.e. BSW(++) and BSC(-+)    (i.e. the "Lister protocol")
            for PE sequencing, map read#1 to ++ and -+, read#2 to +- and --. 
            -n 1: map SE or PE reads to all 4 strands, i.e. ++, +-, -+, --    (i.e. the "Cokus protocol")

Thanks!

Original issue reported on code.google.com by huy.q.d...@gmail.com on 24 Jul 2013 at 6:44

GoogleCodeExporter commented 8 years ago
Please don't restart this yet.  I just looked very quickly at methods and it 
looks like the standard directional protocol.  But I have a meeting right now 
and no time to read it carefully.  The two are very obvious to distinguish from 
the sequences themselves as directional should have very few Cs in the first 
end.

Original comment by benb...@gmail.com on 24 Jul 2013 at 6:49

GoogleCodeExporter commented 8 years ago
got it

Original comment by zack...@gmail.com on 24 Jul 2013 at 6:52

GoogleCodeExporter commented 8 years ago
I see what's wrong here.  First of all, the people who did this work seem to 
have been CONFUSED.  Their methods section is confusing, the "Whole-Genome 
Bisulfite Sequencing" section describes the standard Illumina directional 
protocol (the other one is almost never used by anyone).  And they say they 
aligned it using SOAPAligner.  But then they have this whole extra section 
called "Read mapping" which describes non-directional method and says they 
aligned using GEM.

The sequences I see in GEO look totally consistent with paired end 
non-directional protocol (see attachments).  They have few or no Cs on end1 
(they become Ts), and few or no Gs on end2 (they become As).  But the FASTQs we 
have in our directory are screwed up.  They have end1 and end2 concatenated, 
which should never be done (even in the case of non-bisulfite sequence.  End 
pairs have gaps in between, so you never want to align it as a single 
sequence).  I'm not sure how GEO let you download them in this format, which 
seems really incorrect to me.  We either need to re-download or convert to 
fastqs that have the correct paired-end fastq format that matches our other 
data.  Then it should work fine with our BSMAP pipeline.

Original comment by benb...@gmail.com on 24 Jul 2013 at 8:34

Attachments:

GoogleCodeExporter commented 8 years ago
I have seen this end-concat'd format with some data that lijing gave me one 
time. I'm guessing she also got it from GEO and it's some a weird proprietary 
format of theirs, probably with the good intention of having a single file.  

Original comment by zack...@gmail.com on 24 Jul 2013 at 10:05

GoogleCodeExporter commented 8 years ago
I will re-do the FASTQ and we can re-run the pipeline. Thanks all!

Original comment by huy.q.d...@gmail.com on 24 Jul 2013 at 10:08

GoogleCodeExporter commented 8 years ago
Hi Zack,

I have the pair-end FASTQ file at the folder 
~huydinh/huydinh_research/Data/External/Esteller_new

Can you setup a pipeline run for it asap?

Thanks a lot!

Original comment by huy.q.d...@gmail.com on 26 Jul 2013 at 8:50

GoogleCodeExporter commented 8 years ago
I've submitted the job, but expect it to take a while due to high demand.

Original comment by zack...@gmail.com on 26 Jul 2013 at 10:46

GoogleCodeExporter commented 8 years ago

Original comment by zack...@gmail.com on 5 Aug 2013 at 8:08