QUESTION - persistent datastore for LSH

haifengl / smile

Statistical Machine Intelligence & Learning Engine

https://haifengl.github.io

Other

6.02k stars 1.13k forks source link

QUESTION - persistent datastore for LSH #699

Closed reynoldsm88 closed 2 years ago

reynoldsm88 commented 2 years ago

Am I correct in my understanding that the only backing datastore for the LSH implementation is in memory? We have the use case where we will want separate processes to be able to update and query the LSH data.

If that is the case, do you have any guidance as to how one might use the current LSH implementation while having it backed by a persistent cache such as Redis?

haifengl commented 2 years ago

Not for now. And it were me, I wouldn't store LSH in Redis. LSH supposes to be fast, any remote cache defeats the purpose.

reynoldsm88 commented 2 years ago

If that's the case, do you have any guidance on how to achieve the desired effect I'm trying to achieve?

Essentially we have inputs coming from several different sources, including a streaming source where new processors may be dynamically started to keep up with the number of inputs. We would like to globally deduplicate inputs, so multiple distributed components would have to have a single view of the LSH cache.