Closed GoogleCodeExporter closed 8 years ago
Please don't restart this yet. I just looked very quickly at methods and it
looks like the standard directional protocol. But I have a meeting right now
and no time to read it carefully. The two are very obvious to distinguish from
the sequences themselves as directional should have very few Cs in the first
end.
Original comment by benb...@gmail.com
on 24 Jul 2013 at 6:49
got it
Original comment by zack...@gmail.com
on 24 Jul 2013 at 6:52
I see what's wrong here. First of all, the people who did this work seem to
have been CONFUSED. Their methods section is confusing, the "Whole-Genome
Bisulfite Sequencing" section describes the standard Illumina directional
protocol (the other one is almost never used by anyone). And they say they
aligned it using SOAPAligner. But then they have this whole extra section
called "Read mapping" which describes non-directional method and says they
aligned using GEM.
The sequences I see in GEO look totally consistent with paired end
non-directional protocol (see attachments). They have few or no Cs on end1
(they become Ts), and few or no Gs on end2 (they become As). But the FASTQs we
have in our directory are screwed up. They have end1 and end2 concatenated,
which should never be done (even in the case of non-bisulfite sequence. End
pairs have gaps in between, so you never want to align it as a single
sequence). I'm not sure how GEO let you download them in this format, which
seems really incorrect to me. We either need to re-download or convert to
fastqs that have the correct paired-end fastq format that matches our other
data. Then it should work fine with our BSMAP pipeline.
Original comment by benb...@gmail.com
on 24 Jul 2013 at 8:34
Attachments:
I have seen this end-concat'd format with some data that lijing gave me one
time. I'm guessing she also got it from GEO and it's some a weird proprietary
format of theirs, probably with the good intention of having a single file.
Original comment by zack...@gmail.com
on 24 Jul 2013 at 10:05
I will re-do the FASTQ and we can re-run the pipeline. Thanks all!
Original comment by huy.q.d...@gmail.com
on 24 Jul 2013 at 10:08
Hi Zack,
I have the pair-end FASTQ file at the folder
~huydinh/huydinh_research/Data/External/Esteller_new
Can you setup a pipeline run for it asap?
Thanks a lot!
Original comment by huy.q.d...@gmail.com
on 26 Jul 2013 at 8:50
I've submitted the job, but expect it to take a while due to high demand.
Original comment by zack...@gmail.com
on 26 Jul 2013 at 10:46
Original comment by zack...@gmail.com
on 5 Aug 2013 at 8:08
Original issue reported on code.google.com by
huy.q.d...@gmail.com
on 24 Jul 2013 at 6:44