seqan / lambda

LAMBDA – the Local Aligner for Massive Biological DatA
https://github.com/seqan/lambda
Other
77 stars 20 forks source link

lcaid and lcataxid options fail #119

Closed mbrockman1 closed 6 years ago

mbrockman1 commented 6 years ago

Versions:

Linux 4.13.0-38-generic #43~16.04.1-Ubuntu SMP Wed Mar 14 17:48:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)

lambda2 version: 1.9.4 SeqAn version: 2.4.0

After running:

lambda2 searchp --query pro_test.fa --index ./protein.lambda/ --output-columns 'std lcataxid'

I receive:

Reading index properties... done. Detecting query alphabet... aminoacid detected. Checking memory requirements... met. Loading Subj Sequences... done. Loading Subj Ids... done. Loading Database Index... done. Loading Subject Taxonomy IDs... done. Loading Subject Taxonomic Tree... done. done. Loading Query Sequences and Ids... done. Searching and extending hits on-line...progress: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% |...~/lambda/src/search_misc.hpp:272 FAILED! (One of the paths didn't lead to root.)

stack trace: 0 [0x4c4a89] lambda2() 1 [0x47b306] lambda2() 2 [0x6662a8] lambda2() 3 [0x55680b] lambda2() 4 [0x7f5dbd0c9cbf] GOMP_parallel + 0x3f 5 [0x5ebbd8] lambda2() 6 [0x4c9436] argConv0(LambdaOptions&) + 0xfa6 7 [0x4e470a] searchMain(int, char const**) + 0x31a 8 [0x44d632] main + 0x1e2 9 [0x7f5dbdfbd830] __libc_start_main + 0xf0 10 [0x44e229] _start + 0x29

Aborted (core dumped)

h-2 commented 6 years ago

Thanks for your detailed bug-report!

Can you make the files available to me for debugging? This doesn't need to be on GitHub in case the data sensitive, but it would really help! This is my email-address: hannes.hauswedell@fu-berlin.de

mbrockman1 commented 6 years ago

I used the master branch to build from source.

I am using a list of data sensitive sequences. It seems as if one sequence is messing it up for the rest. I need to do further investigation.

The database I used was Uniprot Sprot 20171205.

Maybe it is because I am using a significantly small database, compared to the larger ones.

h-2 commented 6 years ago

Maybe it is because I am using a significantly small database, compared to the larger ones.

No, I am pretty sure it's a bug in my routines for compressing the lca-tree. I will be on vacation for the next two weeks, but I will look into it immediately after that. Sorry it has taken me so long to respond to this.

h-2 commented 6 years ago

I have reproduced the issue and this patch fixes it for me:

--- a/src/search_algo.hpp
+++ b/src/search_algo.hpp
@@ -1387,7 +1387,7 @@ _writeRecord(TBlastRecord & record,
             record.lcaTaxId = 0;
             for (auto const & bm : record.matches)
             {
-                if (length(lH.gH.sTaxIds[bm._n_sId]) > 0)
+                if ((length(lH.gH.sTaxIds[bm._n_sId]) > 0) && (lH.gH.taxParents[lH.gH.sTaxIds[bm._n_sId][0]] != 0))
                 {
                     record.lcaTaxId = lH.gH.sTaxIds[bm._n_sId][0];
                     break;
@@ -1397,7 +1397,7 @@ _writeRecord(TBlastRecord & record,
             if (record.lcaTaxId != 0)
                 for (auto const & bm : record.matches)
                     for (uint32_t const sTaxId : lH.gH.sTaxIds[bm._n_sId])
-                        if (sTaxId != 0) // TODO do we want to skip unassigned subjects
+                        if (lH.gH.taxParents[sTaxId] != 0) // TODO do we want to skip unassigned subjects
                             record.lcaTaxId = computeLCA(lH.gH.taxParents, lH.gH.taxHeights, sTaxId, record.lcaTaxId);

             record.lcaId = lH.gH.taxNames[record.lcaTaxId];

Could you give it a try and report back?

Thanks a lot!

h-2 commented 6 years ago

This is now also in the master branch and will soon be released as 1.9.5. Please re-open this issue if it still doesn't work for you!