jnalanko / VOMM

Space-efficient variable-order Markov models
6 stars 1 forks source link

command line switch to reuse an index? #21

Closed fcunial closed 6 years ago

fcunial commented 6 years ago

last week i was able to solve the problem with the cluster, and i started building the indexes on all bacteria. after five days and 13 hours, one of the indexes is still not built (the one with depth pruning at 8, the smallest i tried), and it seems stuck to "Building the BiBWT".

could you please provide a switch that allows one to reuse parts of an index at depth, say, 16, which i already have, without recomputing everything from scratch?

i think this is really crucial in order to index large files in practice.

jnalanko commented 6 years ago

Isn't this what the reconstruct-program does? Yesterday I fixed an issue where it was exploring the whole suffix link tree always, so it was broken for depth bounded reconstruction. Now it should be ok. See README for documentation.

jnalanko commented 6 years ago

Aha, I see, you want to build topologies with different pruning modes from the same index?

fcunial commented 6 years ago

yes, i just want to build the usual histograms faster.

but this is not high-priority. i stopped and relaunched the long job, maybe it was an issue with the sockets of my machine (i had to stop telling the machine to allocate memory from just one socket, since i discovered that the max visible from one socket is 256 GB. but this might imply that the machine now copies memory from one socket to the other as the job switches from one socket to the other. i never fully understood how this works).

On 3. Apr 2018, at 10:07, jnalanko notifications@github.com wrote:

Aha, I see, you want to build topologies with different depth-bounds from the same index?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

fcunial commented 6 years ago

thanks, i completely forgot about such program! does it accept all flags of build_model_optimized, including —depth ?

On 3. Apr 2018, at 10:05, jnalanko notifications@github.com wrote:

Isn't this what the reconstruct-program does? Yesterday I fixed an issue where it was exploring the whole suffix link tree always, so it was broken for depth bounded reconstruction. Now it should be ok. See README for documentation.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

jnalanko commented 6 years ago

RTFM :). At README.txt. It does not accept --depth. Why? Because if you have the topology for say depth 4, then you can't mark contexts up to depth say 8, because they don't exist in the topology. It reads the depth bound of the index from an info file and uses that. It could be made so that you could give a new depth if the depth is smaller than the depth used in the index.

fcunial commented 6 years ago

thanks. i think that even having a program that just reuses the bwts would be useful in practice. but again, not high-priority.

On 3. Apr 2018, at 10:42, jnalanko notifications@github.com wrote:

RTFM :). At README.txt. It does not accept --depth. Why? Because if you have the topology for say depth 4, then you can't mark contexts up to depth say 8, because they don't exist in the topology. It reads the depth bound of the index from an info file and uses that. It could be made so that you could give a new depth if the depth is smaller than the depth used in the index.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.