Closed bleuge closed 3 years ago
Good question. Yes - we should be coming out with more details later. I am happy to chat offline at jon_oliver@trendmicro.com in the meantime
Is there any further update on this, or other ways to prune the search space to find matches against a large data set?
Hi @bleuge , @abgoldberg and others,
I have written up a technical overview on the issues for fast search http://tlsh.org/papers.html And then how to use fast search to do scalable clustering. The technical overview points to 2 conference papers that discuss the issues
Cheers jono
Hi, I have a question, maybe not entirely related to TLSH. In the case of scanning really big filesets against a list of TLSH, looking for small differences. I know filesize is part of the 3 first bytes in the hash, but there is any rule, taking account I'll pretend to use Tlsh, so I could skip files when there are out of certain filesize limits. I could store filesizes apart if needed. The idea is if a file has size X, and I already have its TLSH. And want to compare it against another one, if filesizes differences are over a certain ratio, I skip calculating the new tlsh, as I suppose if files are too different in sizes, there is not needed for calculating tlsh? The question here is, have you tested this? It's worth the effort? I think is not directly related to TLSH but how we use it.