irods / irods_client_globus_connector

The iRODS Globus Connector
2 stars 4 forks source link

[30] Used only good replicas and took min of update time #32

Closed JustinKyleJames closed 3 years ago

JustinKyleJames commented 3 years ago

Note: Used min under the assumption that if multiple replicas are good, a checksum calculated against the oldest replica would be valid for all replicas.

trel commented 3 years ago

i think the convention in the filesystem library / checksum library was to use the latest good replica (so... max()). @korydraughn is that correct?

korydraughn commented 3 years ago

That is true @trel. It uses the checksum of the latest good replica

korydraughn commented 3 years ago

Looks like it returns an empty string if there is no checksum. I believe that is fine.

korydraughn commented 3 years ago

See here https://github.com/irods/irods/blob/d5f84c6054c1d6aa9464651a8646f6a9028279b2/lib/filesystem/src/filesystem.cpp#L959

trel commented 3 years ago

min() gives us the 'oldest' good replica... do we want the 'latest' (max()) good replica instead?

JustinKyleJames commented 3 years ago

min() gives us the 'oldest' good replica... do we want the 'latest' (max()) good replica instead?

I put the rationale in the commit comments.

"Used min under the assumption that if multiple replicas are good, a checksum calculated against the oldest replica would be valid for all replicas."

Basically, isn't it true that if we have N good replicas, they all should have the same data? So min should suffice and make us recalculate less often. Example, if we replicated to another resource, there should be no reason to recalculate the checksum.

trel commented 3 years ago

yes, it will be valid, but that's not the convention used elsewhere (aka filesystem)... if they're equivalent, then i'd rather we keep consistent.

if there's a legitimate reason (aka, the min() will change less often)... then that's okay, but we should say that's the reason in the commit message rather than just 'it would be valid'.

JustinKyleJames commented 3 years ago

yes, it will be valid, but that's not the convention used elsewhere (aka filesystem)... if they're equivalent, then i'd rather we keep consistent.

if there's a legitimate reason (aka, the min() will change less often)... then that's okay, but we should say that's the reason in the commit message rather than just 'it would be valid'.

My thinking was that if a file is replicated from one resource to another, that would trigger a recalculation of the checksums. Do we want that? Added this to the commit message.

trel commented 3 years ago

That makes sense to me. Let's # these and get them in. Thanks.

JustinKyleJames commented 3 years ago

That makes sense to me. Let's # these and get them in. Thanks.

Done

trel commented 3 years ago

Please point to this particular repo in the commit message - it is trying to link to an unrelated issue in the 'upstream' of the fork-tree.

This is the first time i've seen this happen... not sure what we can do about it outside of removing our fork from the tree itself.

JustinKyleJames commented 3 years ago

Please point to this particular repo in the commit message - it is trying to link to an unrelated issue in the 'upstream' of the fork-tree.

This is the first time i've seen this happen... not sure what we can do about it outside of removing our fork from the tree itself.

Done