qcif / data-curator

Data Curator - share usable open data
MIT License
264 stars 38 forks source link

Hashing of properties or resources #1029

Open ghost opened 4 years ago

ghost commented 4 years ago

In #1003, the idea was raised of providing a hash so that users could verify incoming and outgoing data for Data Curator. While we used custom key/value pairs to attempt to overrcome some of the issues raised there, the idea of a hash seems worthy of consideration for other use cases.

KyleHaynes commented 3 years ago

Came here to say this as well ... preserving a hash of the original data would be beneficial in my use case.

Often I have a desire to be able to have reassurance that data is exactly as received from a data custodian. The amounts of time I've seen people open a CSV in say Excel and see undesired coercion and that then saved is too high!

I also think the option to lock the data so it can't be edited within DC would be beneficial as well (happy to create another issue if you would prefer) + potentially having password protected locks in place for both the schema and data would be useful. Beyond this, having a place when you can verify that data has been checked against the schema and QAd would be potentially of use as well (I can extend on this is you are interested / feel what I've suggested is a bit vague).

I would like to think I could get custodians to start using DC, but realistically, I want to look at potentially receiving data + adding it to a DC project and then disseminating data as a zipped DC repo (then educating people on the receiving end).