Open GoogleCodeExporter opened 8 years ago
Hi Kuang,
Both of them actually are aligned, but the second one is filtered by -mm.
Let's the alignments before applying the filter.
===first one===
chr10:73738761-73744760 (0) 450 500; L1Junction_Chr10 (SE) 50 100; + 65; 0 mm
ACTTTTGAGGAATCCGCCTGGCCACGGGAATCACACCATGTTTGCACTGCC
ACTTTTGAGGAATCCGCCTGGCCACGGGAATCACACCATGTTTGCACTGCC
===second one===
chr10:73738761-73744760 (0) 451 523; Chr10_L1Junction (SE) 1 74; + 0; 12 mm
CTTTTGAGGAATCCGCCTGGCCACGGGAATCACACCATGTTTGCACTGCC-CTCAGCTGTTG---AT-TACACCCTCA
CTTTTGAGGAATCCGCCTGGCCACGGGAATCACACCATGTTTGCACTGCCGCACA-ATGT-GCACATGT--ACCCTAA
The second one cannot pass the filter since -mm is set to 5 and the total
mismatches are 12.
That means some of the non-mapped bases can be aligned with a bunch of
mismatches, and these mismatches hinders the alignments to pass the filter.
I think other parameter combinations would be better for doing this job.
Or do you suggest any new filter?
Thanks,
Wan-Ping
Original comment by WanPing....@gmail.com
on 10 Nov 2010 at 4:42
Dear Wan-Ping
Thank you very much for you reply.
Do you have any suggestion on what parameter to used best in order to handle
these split reads?
I was hoping that -min 40 & -mmal would allow for the second read to bypass the
filter since it has >40 continuous match, or maybe i misunderstood the meaning
of these two filter?
would it be possible to output all alignments that had less than 5mm in any
40bp window in the read? and extend the alignment from the 40bp outwards, but
not count those mm from extended alignment into the 5mm criteria? Kind of like
a seed requirement + extension.
Thank you very much for your help!
Lisa
Original comment by huangche...@gmail.com
on 10 Nov 2010 at 8:37
Hi Lisa,
Your understandings of -min and -mmal are correct.
As for your requirement, current MOSAIK cannot handle it, but it's doable.
The question would become how to search that 40bp window considering mismatches
inside it. Searching the perfect 40bp matches would not hard, but including
mismatches makes the problem complicated.
Suppose I find an approach to search that window. Do you still need the
original alignment or the alignment in the window?
For example, in the second alignment, you want MOSAIK reporting 1)
CTTTTGAGGAATCCGCCTGGCCACGGGAATCACACCATGTTTGCACTGCCGCACA-ATGT-GCACATGT--ACCCTAA
or 2) CTTTTGAGGAATCCGCCTGGCCACGGGAATCACACCATGTTTGCACTGCC.
Best,
Wan-Ping
Original comment by WanPing....@gmail.com
on 12 Nov 2010 at 8:51
Hi Lisa,
Since MOSAIK uses a Smith-Waterman algorithm, strictly speaking there is no
extension algorithm. The seed just locates the general area and the algorithm
spits out the best local alignment possible. If a 100 bp read aligns perfectly
for 45 bp but then ends up crappy in the remaining part (e.g. exon junctions,
insertions, etc), the Smith-Waterman algorithm will return a 45 bp local
alignment.
i.e. The SW algorithm has effectively found a high-scoring 45 bp window for you
in this case.
Normally, MOSAIK will count the unaligned portion of the read as mismatches.
e.g. if a 100 bp read aligns and produces a 98 bp alignment, we automatically
assume that we have 2 mismatches. Using the -mmal option disables this
behavior. If the 45 bp local alignment has 1 internal sequencing error with
-mmal enabled, MOSAIK would consider that as having 1 mismatch rather than 56
mismatches.
Using -mmal could potentially allow meaningless results e.g. perfectly matched
10 bp alignments. That's why the -min (alternatively -minp) option exists.
Using -min 40 ensures that your alignments are at least 40 bp before being
reported.
We have used to -mmal and -min parameters with great success in our split-read
research in the Marth Lab. We had some additional in-house utilities that then
extracted the split read alignments from the original FASTQ files, so that we
could align a second pass using the remaining portion of the read.
Unfortunately those utilities were rather specific to that particular project,
so they're not really ready to be released into the wild. However, I mention it
simply as an example of what can be accomplished with the -mmal and -min
parameters.
Cheers,
// Michael
Original comment by snowneb...@gmail.com
on 16 Nov 2010 at 3:35
Hi Michael,
Thank you very much for your reply,
However, the second read in my example had a perfect 40bp window match but fail
to be reported with -mm 5.
Is there a way i can avoid such scenario?
Thanks,
Lisa
Original comment by huangche...@gmail.com
on 16 Nov 2010 at 5:09
In respond to Wan-Ping's question:
I would hope that the alignment will be performed up to the listed # of
mismatch (-mm) for any reads with a certain window size (-min) of match.
Thank you very much for the help!
Lisa
Original comment by huangche...@gmail.com
on 16 Nov 2010 at 5:16
Hi Lisa,
For resolving this issue, after Smith-Waterman, I iteratively trim alignments
util both of -min 40 and -mm 5 are satisfied or the lengths of alignments are
already shorter than -min 40.
Here are 3 testcases.
===Before===
CTTTTGAGGAATCCGCCTGGCCACGGGAATCACACCATGTTTGCACTGCC-CTCAGCTGTTG---AT-TACACCCTCA
|||||||||||||||||||||||||||||||||||||||||||||||||| | || ||| | || | |||||||
CTTTTGAGGAATCCGCCTGGCCACGGGAATCACACCATGTTTGCACTGCCGCACA-ATGT-GCACATGT--ACCCTAA
===After===
CTTTTGAGGAATCCGCCTGGCCACGGGAATCACACCATGTTTGCACTGCC
||||||||||||||||||||||||||||||||||||||||||||||||||
CTTTTGAGGAATCCGCCTGGCCACGGGAATCACACCATGTTTGCACTGCC
======Before======
CTTTTGAGGAATCCGCCTGGCCACGGGAATCACACCATGTTTGCACTGCC-CTCAGCTGTTG---AT-TACACCCTCA
||||| |||||||| |||| |||||||||||||| ||||||||||||||| | || ||| | || | |||||||
CTTTTAAGGAATCCCCCTGACCACGGGGATCACAACATGTTTGCACTGCCGCACA-ATGT-GCACATGT--ACCCTAA
======After======
CTTTTGAGGAATCCGCCTGGCCACGGGAATCACACCATGTTTGCACTGCC
||||| |||||||| |||| |||||||||||||| |||||||||||||||
CTTTTAAGGAATCCCCCTGACCACGGGGATCACAACATGTTTGCACTGCC
======Before======
CCAAGAAGCTGACTTTTGAGGAATCCGCCTGGCCACGGGAATCACACCATGTTTGCACTGCC-CTCA----GC---TGTT
-GATTACA
|| | || || ||||| |||||||| |||| ||||||| |||||| ||||||||||||||| | || ||
|||| || || |
CCTA-AAACT--CTTTTAAGGAATCCCCCTGACCACGGGGATCACAACATGTTTGCACTGCCGCACAATGTGCACATGTT
AGAGTATA
======After======
CTTTTGAGGAATCCGCCTGGCCACGGGAATCACACCATGTTTGCACTGCC
||||| |||||||| |||| ||||||| |||||| |||||||||||||||
CTTTTAAGGAATCCCCCTGACCACGGGGATCACAACATGTTTGCACTGCC
Since it's not the official release yet, please let me know if you're still
interested in it and I'll give you the download link.
Original comment by WanPing....@gmail.com
on 22 Nov 2010 at 8:02
Hi WangPing,
Thank you very much for your help!
I would definitely love to have the download link.
Is it possible for me to have the option to still obtain the before alignment
for the read that made the cutoff in the After alignment?
one more question on mosaik in general...if the split read is an result of a
~100bp deletion, will mosaik be able to handle it and align both ends?
Thanks,
Lisa
Original comment by huangche...@gmail.com
on 22 Nov 2010 at 9:36
Hi Lisa,
Here you go.
http://clavius.bc.edu/~wanping/Mosaik_split.tar
Mosaik will turn on this feature automatically when using -mmal and -min.
The approach looks like this,
=============================================================
while ( (aligned_length > min) && (mismatch count > mm ) )
chop the alignment
=============================================================
Sorry that I didn't keep the original alignments.
Do you mean the deletions in samples or the genome?
If samples, Mosaik works pretty well.
Original comment by WanPing....@gmail.com
on 23 Nov 2010 at 3:36
Original issue reported on code.google.com by
kuangzhe...@gmail.com
on 9 Nov 2010 at 9:17