rivo / duplo

Detect duplicate (or similar) images. Written in Go.
http://rentafounder.com/find-similar-images-with-duplo/
MIT License
392 stars 23 forks source link

best practices for persisting data? #5

Closed Carseason closed 5 months ago

Carseason commented 6 months ago

Is there any best solution?

rivo commented 6 months ago

I'm not aware of any "best" solution. In my own project, I have at most about a million pictures in a duplo store and I use the GobEncode and GobDecode methods to save to and read from a file. But you obviously cannot make changes during that time. And it may not scale well to your needs. It works for me, though.

I have been thinking about using a cache library like Ristretto for duplo. But this would require some sophisticated changes to the architecture. It's certainly nothing that works out of the box.

Carseason commented 6 months ago

thank you very much for your work, I wish the storage would take up less running memory, do you have a baseline memory usage analysis of millions of photos? I'm using it on an embedded device sorry, my English is not very good

rivo commented 5 months ago

Just checked and found that some of the stores actually contain about 2 million images. Gob-encoded and gzipped, they require around 500MB of disk space.

Carseason commented 5 months ago

cool! I'm already using him on Nas