thegenemyers / DALIGNER

Find all significant local alignments between reads
Other
138 stars 61 forks source link

LAcheck LAS-file error: "Trace point sum != aligned interval" #58

Closed DDDolle closed 6 years ago

DDDolle commented 7 years ago

Hi,

I have encountered an error that I can't explain in a few of my 'daligner' runs.

My DB has 128 blocks of 200Mb that I want to overlap as preparation for error-correction using FALCON.

All 8256 overlap runs together produce 16384 intermediate LAS files (prior to merging) of which 4 show the error mentioned in the header.

Surprisingly tho this error doesn't seem to be symmetric: the LAS file "[DB].X.[DB].Y" is OK while the file "[DB].Y.[DB].X" produced at the same time by the same 'daligner' run is not. Two of the four files show this behavior. The last two however are from the same run and both defect. In all cases X=128 (i.e. the last block in the DB).

My 'daligner' command is:

"daligner -M12 -mdust -mtan -mrep1 -mrep4 -mrep16 -b -e0.70 [DB].Y [DB].X"

and I am using commit "a9458dcb34eca61513c994f4217fb7f75915e25c"

Best,

Dirk

pb-cdunn commented 6 years ago

A user sent me data for a seg-fault, and I think I have reproduced the problem.

$ LA4Falcon -H500 -of ../raw_reads.db ../raw_reads.104.las > out.la4.txt
LA4Falcon: Index 64618683 out of bounds 4699235 (Load_Read)

$ LAcheck -v ../raw_reads.db ../raw_reads.104.las
raw_reads.104: Trace point sum != aligned interval

@thegenemyers, I have the DB (17GB) and the .las (3GB). I will save these in Menlo Park for you, in case the cause of error is not obvious. I will try find out the DALIGNER run which led to this, and the commit-SHA1 that was used. Of course, the DALIGNER used is probably not quite up-to-date with your latest code, so I will try to get some time to sync up.

@DDDolle, I cannot find that commit-SHA1 (a9458...) in this repository. Typo?

Within PacBio: https://jira.pacificbiosciences.com/browse/SE-974

thegenemyers commented 6 years ago

Christopher,

This problem has occurred several times and my recollection is that 

it has not been fully resolved for lack of a reproducible test case for me to work with.

It would be extremely helpful to me if you could pass me the dB and 

the exact call to daligner that lead to the problem. I'm am 99% certain the problem occurred during the daligner run that produced the .las, so I need to know exactly how daligner was being run (and exactly which version if not the latest).

I look forward to seeing you in Menlo park -- with the data sets it 

should be straightforward for me to then track down the bug.

-- Gene

On 9/15/17, 9:16 AM, Christopher Dunn wrote:

A user sent me data for a seg-fault, and I think I have reproduced the problem.

|$ LA4Falcon -H500 -of ../raw_reads.db ../raw_reads.104.las > out.la4.txt LA4Falcon: Index 64618683 out of bounds 4699235 (Load_Read)

$ LAcheck -v ../raw_reads.db ../raw_reads.104.las raw_reads.104: Trace point sum != aligned interval |

@thegenemyers https://github.com/thegenemyers, I have the DB (17GB) and the .las (3GB). I will save these in Menlo Park for you, in case the cause of error is not obvious. I will try find out the DALIGNER run which led to this, and the commit-SHA1 that was used. Of course, the DALIGNER used is probably not quite up-to-date with your latest code, so I will try to get some time to sync up.

@DDDolle https://github.com/dddolle, I cannot find that commit-SHA1 (a9458...) in this repository. Typo?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thegenemyers/DALIGNER/issues/58#issuecomment-329702177, or mute the thread https://github.com/notifications/unsubscribe-auth/AGkkNqKyi_kNd6jLN4QQOzz-SOn0ZC83ks5siiQ9gaJpZM4MdzzS.

DDDolle commented 6 years ago

@pb-cdunn

Hmm, isn't it this one: https://github.com/thegenemyers/DALIGNER/commit/a9458dcb34eca61513c994f4217fb7f75915e25c

pb-cdunn commented 6 years ago

@DDDolle, ah, you're right. But that is an old commit, from November 2016.. I think this was fixed in December.

@thegenemyers, yes, if we cannot resolve this by syncing to the latest DALIGNER, then I will make the test-case available to you on-site in Menlo Park. But I think you can safely ignore this for now.

pb-cdunn commented 6 years ago

Update on my previous comment: The problem reported by the user was likely a result of an I/O error, as LAcheck failed on the .las file. #67 should prevent those in the future. So I don't know of any current problem.

pb-cdunn commented 6 years ago

Gene, I think you can close this. THe submitter can re-open if it recurs.