hydrusvideodeduplicator / hydrus-video-deduplicator

Video Deduplicator for the Hydrus Network
https://hydrusvideodeduplicator.github.io/hydrus-video-deduplicator/
MIT License
41 stars 7 forks source link

[WIP] Save binary phashes to db (skip converting to JSON) #66

Open F18SuperHornet opened 2 weeks ago

F18SuperHornet commented 2 weeks ago

I'm using this tool in a very constrained environment (virtualized, QubesOS), as such I'm trying to squeeze as much power as possible out of the tool.

While benchmarking, I noticed that my CPU was spending a considerable amount of time (about ~25% of processing) on converting the JSON in the DB to phash objects.

According to my understanding, converting phashes to JSON before saving them to DB is...

This PR does NOT yet contain a clean way for existing DBs to migrate, but that should very much be feasible. Let me know if this approach works for you and I'll work on the migration code.