iscc / iscc-specs

ISCC: International Standard Content Code
http://iscc.codes
Other
47 stars 9 forks source link

Implement Content-Code for binary data. #89

Open titusz opened 4 years ago

titusz commented 4 years ago

We could extract printable strings (with different encodings) from all kinds of binary data like executables or custom binary formats with https://github.com/getreu/stringsext ... and create a text similarity signature.

The question is if we still call this Content-ID-Text of if we create a custom Content-ID-Binary that signals that text was extracted from a binary format without any format-specific structured parsing.

lrosenthol commented 3 years ago

What about defining the algorithm that would be used instead of a specific implementation? It would help to look at various binary formats and see what is the most important aspects to apply to the ID.