simongog / sdsl-lite

Succinct Data Structure Library 2.0
Other
2.18k stars 346 forks source link

How would one generate an fm-index of a bytefile? #406

Open maxdos64 opened 6 years ago

maxdos64 commented 6 years ago

Hi Dear Devteam,

I was reading through the example: https://github.com/simongog/sdsl-lite/blob/master/examples/fm-index.cpp And wanted to implement a version which takes as input an regular binary file (also including zero bytes) I am getting an error that this file would contain such and i tracked the issue down to the fact that you seem to use the zero byte in order to maintain your data structure internally.

So i wondered: is that assumption correct and is there a good practice way of getting around this issue?

My current data structure looks like this csa_wt_int<wt_huff<rrr_vector<256>>, 512, 1024> fm_index; and is constructed like this: construct(fm_index, argv[1], 0);

Thanks in advance! And thank you for this amazing project!

Max

mpetri commented 6 years ago

Currently this is correct but there is an experimental. implicit_sentinel branch which tries to avoid this problem.

maxdos64 commented 6 years ago

Oh thank you ! do you happen to know how this is achieved and at what cost ? And have you experience how stable it works ?

Edit: I feel i got a idea of how you achieve it but i am not sure how i could apply it to my code I tried to use csa_wt_implicit_sentinel with the branch but i still fail with the file contains zero warning Cheers, max