hv0905 / NekoImageGallery

An AI-powered natural language & reverse Image Search Engine powered by CLIP & qdrant.
https://image-insights.edgeneko.com/
GNU Affero General Public License v3.0
80 stars 9 forks source link

Use uuid5 associated with file SHA1 instead of randomly generated uuid4 #11

Closed pk5ls20 closed 9 months ago

hv0905 commented 9 months ago

Using the uuid5 directly will rewrite the old entry in database with the new entry directly, if two images have a same SHA1.

Maybe we should query the database to make sure the image doesn't exist in the db before copying and inserting?

pk5ls20 commented 9 months ago

Using the uuid5 directly will rewrite the old entry in database with the new entry directly, if two images have a same SHA1.

Yes, but if the SHA1 of both images is the same, we can assume that both images are the same, i.e. only necessary to index once

Maybe we should query the database before copying and inserting to make sure the images don't exist in the database?

In a sense, there will be no difference in the final database contents between the two, whether you query the database for the same id entry already existing before insertion or just insert it. But maybe the former will save indexing time?

I'll test the time difference between the two later

Due to considering the statefulness of the content in the payload, direct overwriting is not a good option, and the best way to do this is to perform appropriate checks to ensure that any data is not accidentally overwritten

pk5ls20 commented 9 months ago

@hv0905 I think this PR is now ready to be merged into master (:з」∠)