kitware-resonant / dkc-next

Apache License 2.0
3 stars 0 forks source link

Add hash download REST endpoint #156

Closed zachmullen closed 3 years ago

zachmullen commented 3 years ago

Fixes #7

If you guys approve of this endpoint's behavior, I'll add test cases.

mgrauer commented 3 years ago

I spoke with @thewtex about this recently, to discuss using the ITK test data as a driving use case, and we can follow up with him soon as he is excited to help.

For now, I'd want to know if (1) it is easy to add other checksums and (2) possible to mount this at an alternate API route. We can figure out what those are exactly when we get more specific requirements.

It looks like the current DKC has sha512 for ITK test data, but IIUC Matt asked for providing md5 also so that they can retire other infrastructure hosting via md5.

zachmullen commented 3 years ago

IIUC Matt asked for providing md5

I would strongly encourage migration away from MD5 as it has been broken cryptographically (chosen prefix). I would push back against supporting it in dkc-next.

mgrauer commented 3 years ago

Also, can you remind me if we are doing file deduplication? What happens to someone's download request if there are two duplicate files (is this even possible given whatever constraints on File you are using?) and a user has permission for one but not the other?

zachmullen commented 3 years ago

If there are two duplicate files with the same content and the requesting user has permission on both of them, one is selected arbitrarily. The only notable effect that has would be if the two files have different names, and the client is relying on the content-disposition value to set the name on their end. (In the case of ExternalData I don't think the client does that, but I'm not sure.)

EDIT: and to answer your first question, we are not doing de-duplication in our backing store, though that's just an implementation detail.