Support granular similarity hashes for Content-ID

iscc / iscc-specs

ISCC: International Standard Content Code

http://iscc.codes

Other

47 stars 9 forks source link

Support granular similarity hashes for Content-ID #51

Open titusz opened 5 years ago

titusz commented 5 years ago

Use-Case: A user has a small chunk of text and wants to find longer text that contain this chunk or a similar chunk.

Proposed solution draft: Apply shift-invariant text-chunking (for example ~1000 characters). Create separate Content-IDs for each chunk. Supply the chunk ids as metadata to the full ISCC.

titusz commented 3 years ago

This relates to: https://github.com/iscc/iscc-specs/issues/83