irods / irods_resource_plugin_rados

Cacheless Ceph/rados resource plugin for iRODS
BSD 3-Clause "New" or "Revised" License
7 stars 6 forks source link

How is data integrity verification handled? #16

Open kript opened 5 years ago

kript commented 5 years ago

More of a question than a bug report but might turn into a feature request...

TL;DR; how is checksumming expected to work in this plugin, both on upload and using the ichksum command later to verify, given the chunked nature of objects within the resource?

At the moment, as I understand it, an replica of an object is stored in 4MB chunks across the Rados 'bucket'. Therefore, to perform a checksum, the file must be downloaded and reassembled before ichksum can be usefully run against it.

Is that correct? If so, how would ichksum -a be expected to work on a tree with a replication node, meaning that there are more than one copies and one of them is held on the librados back end? Foe that matter, are tools like iscan and ifsck supported?

I can see that https://github.com/irods/irods/issues/2796 would be useful here, but wondering if there were any other thoughts for ways to ensure data integrity without having to read every file back from the bucket!

Cheers

John

trel commented 3 years ago

Saw your link back to here from the SoftIron conversation. The rados plugin itself could 'trust' the storage to provide these types of calculations/values. Another option is to not use the plugin at all, and just use unixfilesystem via CephFS (and perhaps grow a setting that itself... trusts the storage for checksum information).

Otherwise, yes, this is a challenge. And I think you're still ahead of it - we haven't faced this question from others yet, even nearly two years after you posted this.

jasoncoposky commented 3 years ago

Within iRODS every checksum computation for every replica is a full read from storage and a compute. We have discussed moving the checksum operation from an RPC API and delegating that to the underlying storage architecture which may provide quicker and better assurances (e.g. erasure coding) that the data is correct at rest. Given that we could rely on assurances from ceph that data is correct given your own configuration of the storage and iRODS.