irods / python-irodsclient

A Python API for iRODS
Other
63 stars 73 forks source link

ISCC codes for iRODS data objects #573

Open ll4strw opened 3 months ago

ll4strw commented 3 months ago

This isn't a bug report but just a question. Recently ISO published a new standard to calculate a similarity-preserving fingerprint and identifier for digital media assets called ISCC. ISCCs are generated algorithmically from digital content as specified in these python libraries. ISCCs are also able to take into account any metadata associated to a digital object (an iRODS data object) and are not only limited to contents like checksums. Differently than checksums, ISCCs are soft hashes able to evaluate digital objects similarities thanks to their composite nature. I could imagine several applications such as content deduplication, integrity verification, etc.. (all including metadata) which could be useful in an iRODS context. Because the ISCC core routines are written in python, I thought they could be easily used in combination with the iRODS PRC like in this example https://github.com/ll4strw/python-irodsclient-iscc Does the iRODS community have any interest in ISCC objects?

alanking commented 3 months ago

This is very interesting - thanks for raising this. While there are a number of people watching this repository who may like to share thoughts, you might want to try sharing this with the iRODS Chat Google Group as well: https://groups.google.com/g/iROD-Chat This could open up the conversation to users of other libraries who might be interested in using yours as a reference implementation. Just a thought!