stsievert / salmon

A tool to collect triplet queries
https://docs.stsievert.com/salmon/
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Saving samplers to database takes a long time #75

Closed stsievert closed 4 years ago

stsievert commented 4 years ago

Currently the writing to the database takes an inordinate amount of time. In later stages of the optimization, this can take >90% of the time for a single loop:

stsievert commented 4 years ago

This is likely because the adaptive runners (in the case TSTE) is too large, which is likely because the adaptive runners store every answer they receive instead of using the database:

https://github.com/stsievert/salmon/blob/9f37b48b4575b7e67d69833f96e25c20f148375f/salmon/triplets/algs/adaptive/_embed.py#L114

Storing 10,000 answers requires 58KB of memory. For n=85 objects embedding into d=2 dimensions, the embedding is 0.66KB and the posterior is 28KB. Those don't seem incredibly large to me. Memory profiling would be useful here.

stsievert commented 4 years ago

This is an error on the visualization (the legend was reversed), not an error with Salmon. Closing.