Open ghost opened 12 months ago
Thanks for the feedback! I've mostly be focused on getting features in place and working on usability issues. But performance is definitely something on my mind, and it's not yet as fast as I'd like it to be.
Are you using SQLite or Postgres? Postgres should be faster since it uses a C extension, and SQLite has to load all the data in memory then search.
Edit: I missed the pg part in the title when I first looked, I'll investigate.
I havn't tested LZJD, SSDEEP yet. Since B-tree, Hash, SP-GiST are same linear grow on TLSH, my educated guess is it doesn't matter what hash algorithm you choose, it will require O(N) search on all data.
This might help: https://github.com/jinyyu/tlsh_gist There's also this, but I don't understand it: https://zhuanlan.zhihu.com/p/497732848
Related Postgres docs:
Hi,
MalwareDB is great, however when we testing file search up to 10M files, TLSH search requires 10s. I found that TLSH already published the index algorithm( https://tlsh.org/papers.html)
Do we have milestone for better search index? Thanks!