jkuhl-uni / git-annex-remote-zenodo

Use Zenodo as a special remote for git-annex
GNU General Public License v3.0
0 stars 1 forks source link

Explore/Reconsider design choices of the special remote #3

Open adswa opened 3 weeks ago

adswa commented 3 weeks ago

[Extending this comment as I explore the code more]

Without the original authors, the backstory of the special remote isn't fully clear, but the - at many points very convoluted/clunky - implementation suggests that there were a number of limitations on Zenodos side that were getting worked around when originally implementing it.

Some of these decisions make the special remote feel "manual" and also contribute to a very slow performance. A conscious re-evaluation could spark reimplementations that change the special remote's functionality but also standardize it more and improve it.

What I'm aware about:

Now, let’s upload a new file. We have recently released a new API, which is significantly more perfomant and supports much larger file sizes. While the older API supports 100MB per file, the new one has a limit of 50GB total in the record (and any given file), and up to 100 files in the record.

bucket_url = r.json()["links"]["bucket"]

To use the new files API we will do a PUT request to the bucket link. The bucket is a folder-like object storing the files of our record. Our bucket URL will look like this: https://zenodo.org/api/files/568377dd-daf8-4235-85e1-a56011ad454b and can be found under the links key in our records metadata.

The new files API isn't really documented much beyond this as far as I have seen, but the choice of API probably led to the high amount of metadata parsing in the current implementation

jkuhl-uni commented 3 weeks ago

Yes, this is true. I also thought about this. For my use-case, this is a fairly easy fix, as I can re-package results into JSON-files, which are small enough (and not too numerous) to fit in the limitations. I didn't think about chunking. As I do not have experience with zenodo, I am not sure how welcoming they are to lift those restrictions for single projects.