iqbal-lab-org / pandora

Pan-genome inference and genotyping with long noisy or short accurate reads
MIT License
107 stars 14 forks source link

Remove `merge_index` command #308

Closed leoisl closed 1 year ago

leoisl commented 1 year ago

pandora is able to merge different indexes into a single one: https://github.com/rmcolq/pandora/blob/a496870887a8c5d24c33caa051be0123d170a8d9/src/merge_index_main.cpp#L8-L21 . I never used this command since index building was optimised. I am now doing a bit of refactoring on pandora indexes (https://github.com/rmcolq/pandora/issues/306) and could I remove this command or do you think is still useful?

iqbal-lab commented 1 year ago

not sure we need it? is this for merging two prgs of the same locus? or combining two different locus-prgs into one prg?

leoisl commented 1 year ago

combining two different locus-prgs into one prg

iqbal-lab commented 1 year ago

if no need given current index building, then fine i suppose. What if you wanted to index 1 million PRGs, might you parallelise and then merge results?

leoisl commented 1 year ago

What if you wanted to index 1 million PRGs, might you parallelise and then merge results?

That is an option, I wouldn't do that in roundhound though. Using 16 threads, we index 188k PRGs in 32 mins. Linear extrapolation would make us index 1M in ~3h, which is totally ok IMO. Indexing 10M PRGs might still be fine, might take ~30h with 16 threads, but there are still optimisations we could do and also increase nb of threads. But if you think we could potentially get into the order of 100M PRGs to index, then I think we should keep this command

iqbal-lab commented 1 year ago

OK Fine I agree, let's get rid of it