tongzhouxu / mashpit

:pencil: Sketch-based surveillance platform
GNU General Public License v2.0
9 stars 3 forks source link

Merging two databases #13

Open lskatz opened 4 years ago

lskatz commented 4 years ago

Just a thought for a future version (and not now) but it might be kind of awesome if we could merge two databases. If it works, it could be a mechanism for users to update local databases without having to remake them.

luizirber commented 4 years ago

In sourmash we "avoided" the merging problem by allowing multiple DBs to be searched at once. For example, you can run sourmash search query.sig db1.lca.json.gz db2.sbt.zip sig3.sig sig4.sig and keep appending LCA, SBT or just plain signatures to the end of the command.

(this also led to other discussions about diff/patch databases, so I'm also interested in how the solution is going to work here =] )

lskatz commented 4 years ago

Thank you for the insights @luizirber! We have found that with our slow redundant hard drives, it's faster to make sure all or most of the database is in one sig file, but we might also find that we want to have separate files to save time on things like indexing. We will also have to consider how the sqlite database merges on top of the sig files. Bottom line though is that this conversation and the linked conversation will be a good reference for us in the future when we can address it.