kamimrcht / REINDEER

REINDEER REad Index for abuNDancE quERy
GNU Affero General Public License v3.0
56 stars 6 forks source link

malloc(): invalid size (unsorted) during a query #13

Open MitraDarja opened 3 years ago

MitraDarja commented 3 years ago

Hi,

I am trying to use Reindeer, but I run in the following problem: Every query I try on my reindeer index will result in a malloc(): invalid size (unsorted) error. Any idea, how I could fix this?

I build my index in the following way (I used bcalm beforehand on more than 1000 RNA seq files):

./Reindeer --index -f bcalm2/build-by-me/fof_reindeer.lst -k 21 -o out_reindeer_large_logcount_dataset --log-count

This worked without throwing an error.

I run my query in the following way:

./Reindeer --query -q 2.fa -l out_reindeer_large_logcount_dataset

The output of the query command is:

############# REINDEER version v1.0.2 #############
Command line was: ./Reindeer --query -q 2.fa -l out_reindeer_large_logcount_dataset 

Querying...

#Loading index...
Index loaded
Index loading total: 544.053 seconds.

#Computing query...
Result will be written in output_reindeer/query_results/out_query_Reindeer_P40_2_0.out
malloc(): invalid size (unsorted)
Aborted (core dumped)

Which also makes my think that the index is fine, because the load works. The strange thing though is, if I search my query on a different reindeer_index, which I build of 16 RNA seq files, then my query file works.

I also tried to use the samples_1000_transcripts.fa as descirbed here: https://github.com/kamimrcht/REINDEER/blob/master/reproduce_manuscript_results/queries.sh But that lead to

[Warning] Sequence containing a 'N' character or invalid header is disregarded.
Segmentation fault (core dumped)

My query file consists of 3 randomly picked transcripts, but I also tried a file with only one transcript and 100 transcripts, but all of them lead to the same result. (When I used the 100 transcript file with my smaller index that worked with 1 and 3 transcripts, I run in another error, then I get a segfault...)

Any help with this would greatly be appreciated! Thanks.

rchikhi commented 3 years ago

Hi Mitra, as @kamimrcht is on leave I'm unsure if that bug can be addressed quickly, however I'll ask you a quick question first: did you use the latest Github master branch, or the latest release? (I'd recommend trying the alternate one.)

MitraDarja commented 3 years ago

Thanks for the fast response, I tried both and I see the behavior with the master branch and the latest release.

rchikhi commented 3 years ago

OK I see, then it'll be in @kamimrcht territory

kamimrcht commented 3 years ago

Hi I'm on maternity leave so I'm ridiculously slow to answer. Thanks for your feedback! I will be back on track this fall to investigate this issue. Cheers.

MitraDarja commented 3 years ago

Hi,

I have tried to use a smaller example and I am running in the same problem, but I think I have now a better idea why it happens: So, I try running: ./Reindeer --index -f Simulated_4.lst -k 19 -o out_test_64_4

Where Simulated_4.lst has the paths to untigs file created by bcalm2.

This command does not run thorugh, instead I get the following error:

#Building colors and equivalence classes matrix to be written on disk...
Sorting datasets to find equivalence classes...
terminate called after throwing an instance of 'zstr::Exception'

When I then re-run the same command, it runs through with the warning messages:

[Warning] monotig files (monotig_files) were found in output dir, I will use them and I won't delete them
[Warning] index file (reindeer_index.gz) was found in output dir, I will use it and I won't delete it

Then I have an index, but if I want to query anything, I get a segfault.

#Loading index...
Index loaded
Index loading total: 0.550469 seconds.

#Computing query...
Result will be written in output_reindeer/query_results/out_query_Reindeer_P40_100_0.out
Segmentation fault (core dumped)

Furthermore, using less data, for example just 1/4 of the files lead to a build on the first try and the query works as expected...

I am not sure, if this is the same problem as the one I originally posted about, but I noticed that the index where I had the first problem, was also only build on the second try and before that run in a 'zstr::Exception', so that might be the cause for these issues?

I hope, these tests made the problem clearer, so that a solution can be found once you have time again. :)

And last but not least, congrats. :D