Closed leoisl closed 1 year ago
not sure we need it? is this for merging two prgs of the same locus? or combining two different locus-prgs into one prg?
combining two different locus-prgs into one prg
if no need given current index building, then fine i suppose. What if you wanted to index 1 million PRGs, might you parallelise and then merge results?
What if you wanted to index 1 million PRGs, might you parallelise and then merge results?
That is an option, I wouldn't do that in roundhound though. Using 16 threads, we index 188k PRGs in 32 mins. Linear extrapolation would make us index 1M in ~3h, which is totally ok IMO. Indexing 10M PRGs might still be fine, might take ~30h with 16 threads, but there are still optimisations we could do and also increase nb of threads. But if you think we could potentially get into the order of 100M PRGs to index, then I think we should keep this command
OK Fine I agree, let's get rid of it
pandora
is able to merge different indexes into a single one: https://github.com/rmcolq/pandora/blob/a496870887a8c5d24c33caa051be0123d170a8d9/src/merge_index_main.cpp#L8-L21 . I never used this command since index building was optimised. I am now doing a bit of refactoring on pandora indexes (https://github.com/rmcolq/pandora/issues/306) and could I remove this command or do you think is still useful?