trendmicro / tlsh

Other
726 stars 135 forks source link

"Invalid hash string, length does not match any known encoding" > Error Occurs. #99

Closed soimkim closed 3 years ago

soimkim commented 3 years ago

When the hash value created with py_tlsh (4.5.0) is called with java, an "Invalid hash string, length does not match any known encoding" error occurs.

Would you reply to what should be corrected in the java code part to fix this error?

jonjoliver commented 3 years ago

Hi @soimkim

There were 2 issues which required us to change some elements of TLSH

  1. There was a potential division by 0 (issue #79) - the C++ code caught it with a try-catch
  2. The C++ code had issues with file > 2 Gig (issue #84) We fixed the C++ code base (this repository) And released a new Python library - that also fixed problems with Python on Windows This is the py-tlsh package (4.5.0)

This caused changes to the generation of a few few files (edge cases) Still we added a "T1" version string to the front of the hash - to track these changes The C++ code and the Python code is backwards compatible with old hashes

Which Java library of TLSH are you using? I think we need to fix that up to be up to date with these few changes

Cheers jono

soimkim commented 3 years ago

Hi @jonjoliver ,

I use the library (tlsh_3.7.1.jar) built in the java folder of this repository and call the "totalDiff, fromTlshStr" function that compares the tlsh value extracted from the tool as follows example.


public class MainTest {

public static void main(String[] args) { 
  String srcTlsh = "21272383E754E01BE4FF953116996103B3853D588A42A31A1790F6EE39BFCC63F86E85";
  String targetTlsh = "540612D3F355F42BC636C53271A24222519BCDE48703EB266506F7B9ACFBE854980BD8";

  Tlsh tlshTest1 = Tlsh.fromTlshStr(srcTlsh);
  Tlsh tlshTest2 = Tlsh.fromTlshStr(targetTlsh);
  System.out.println(tlshTest1.totalDiff(tlshTest2, true));
 }
 } 

I need a call from a spring based web service, so I need tlsh written in java.

Thanks & regards, Soim

jonjoliver commented 3 years ago

Just chasing down the developer who did the Java port I will set up the environment - and then do some fairly minor tweaks to move it from 3.7.1 -> 4.5.0 This may take me a while (once I have the environment (gradle etc) it should be very quick)

Workaround: you could add a few lines of code to remove the T1 from the start of the hash that you get from Python before you do anything in Java (???)

soimkim commented 3 years ago

Thank you very much for your quick reply! The workaround you gave me is also a good way.

jonjoliver commented 3 years ago

@soimkim Could you test version 4.6.0 ? Thanks

soimkim commented 3 years ago

@jonjoliver , When I tested it with the jar file build with v4.6.0, the comparison of the tlsh value for the hash value output by py_tlsh (4.5.0) works without error.

Thanks for responding quickly!

soimkim commented 3 years ago

This is an issue resolved in v4.6.0.