Closed pb-cdunn closed 7 years ago
Adding @pb-jchin.
Just a comment. The FALCON overlap to graph only takes first pair of overlap. Duplicate records will not break FALCON's consensus and overlap-and-graph module.
Chris,
Sorry for the long delay in responding but I've had a bit of travel
and the fix was a little deeper than it might at first appear.
I committed new code that should fix the problem for you.
But you should also be advised that using a trace-point spacing of
1000 (-s1000) when
the minimum local alignment is also 1000 has some drawbacks. While you save
a factor of 5 (not 10 for complex reasons) in disk space, you have lost
the chance
to compute intrinsic quality values at a granularity that is useful.
Also, when looking
for overlapping alignments (as well as duplicates), the daligner looks
for alignments
that share a trace point. The duplicate alignment reported had only 2
trace points
near the ends of the rather short alignment involved and neither one
matched. When
I reran the thing with -s100 there was no problem. However, I have
fixed the problem
by considering end-points of alignments to be "trace-points".
Cheers,
Gene
On 6/15/16, 11:01 PM, Christopher Dunn wrote:
- LAcheck -v rawreads.db rawreads.3.rawreads.12.C3.las rawreads.3.rawreads.12.C3: Duplicate overlap (13481 vs 54643)
Test-data available at:
(Might take 20min to download. Not sure.)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/thegenemyers/DALIGNER/issues/42, or mute the thread https://github.com/notifications/unsubscribe/AGkkNvq-Q278TAACn8dl0SNQ4NzejW_kks5qMGg8gaJpZM4I2ykP.
Thanks for that clarification. We might (I hope) switch to shorter trace-points so that we don't miss alignments.
And thank you for solving this tricky problem. Much appreciated. We had turned off LAcheck
pending a solution.
we use -s100
for most projects. The current FALCON pipeline does not use trace point. If -s100
is better for checking, we can always use -s100
.
a side note, while the consensus module used in FALCON does not use the trace point now, it will be eventually useful to have the alignment end point information. The consensus module does its own O(ON) alignment. A k-mer match table and binning k-mer are used to find the begin and the end of the alignment with in the consensus module. If the begin and the end points are known, it can save some small amount of computation.
We're seeing this again. We're trying to get another test-case from a user to reproduce it, but it's probably large. (PacificBiosciences/FALCON-integrate#103)
Can you think of any other cause for this?
If you mean the duplicate overlap report, there could be another bug or missed consideration albeit it seems unlikely. I have recently run full-scale data sets without a hiccup.
In the #103 the problem appeared to be a core-dump. Could it again be the small /tmp issue? If not, then we are back to needing to exchange an example of the phenomenon that allows me to check what's going on.
-- Gene
On 11/18/16, 1:07 AM, Christopher Dunn wrote:
We're seeing this again. We're trying to get another test-case from a user to reproduce it, but it's probably large. (PacificBiosciences/FALCON-integrate#103 https://github.com/PacificBiosciences/FALCON-integrate/issues/103)
Can you think of any other cause for this?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/thegenemyers/DALIGNER/issues/42#issuecomment-261410000, or mute the thread https://github.com/notifications/unsubscribe-auth/AGkkNti-3xSlEWfviAZosZt0cruwKEwKks5q_Ow_gaJpZM4I2ykP.
Test-data available at:
(Took 20min to to upload, but should quicker for you to download.)