mpetri / cstlm

CST based language model powered by SDSL
2 stars 2 forks source link

issue in the parallel construction of the SA #1

Open fcunial opened 6 years ago

fcunial commented 6 years ago

i ran on a character-based collection (built with <create-collection.x -i ./inputFile.txt -c ./collectionDir -1>) created from a 28-billion-character dna file, and i got the attached segfault:

gdb.txt

note that does not fail, and it seems to populate <./collectionDir> correctly (<./collectionDir/tmp> and <./collectionDir/index>, however, are empty). i'm assigning 900GB of RAM to . the input file consists of just one line containing all characters. does not fail for smaller files in the same format (i tried up to 5.9 billion characters).

mpetri commented 6 years ago

Hello @fcunial. Do you have a file where I can try this out?

fcunial commented 6 years ago

unfortunately the error happens to me just with large files. i'm trying to share one from here:

https://cloud.mpi-cbg.de/index.php/s/5Sjk9DmKA3IBTyV

since the file is large, the error might manifest after many hours.