int-brain-lab / ONE

Open Neurophysiology Environment
MIT License
16 stars 4 forks source link

Bug report - Bulk download fails with revision #65

Open oliche opened 1 year ago

oliche commented 1 year ago
eid = 'c51f34d8-42f6-4c9c-bb5b-669fd9c42cd9'
one = ONE()
dsets = one.list_datasets(eid=eid)
one.load_datasets(eid, dsets, download_only=True)

Results in: one.alf.exceptions.ALFError: No default revision for dataset alf/probe00/pykilosort/clusters.amps.npy

I've temporarily fixed by adding an assert_unique=False line 1102 of one.api, but I think in this case it should yield the most recent dataset first.

k1o0 commented 1 year ago

I finally got around to looking into this. The load methods were never meant to support relative paths as inputs, they just happened to sort of work. I hope you're the only person using it this way. Because you still stubbornly load things this way I added a bit more support for it although the logic is more complicated now. If you provide relative paths as inputs there must be no wildcard or regex (it's not possible to properly validate this) and the collection and revision args must be None. If loading this way you will get a warning when one or more datasets are not default revisions. Blindly loading an entire session's data is problematic because you get all the revisions and the user has no way of knowing which is the 'correct' version of the data. For that reason it's better to encourage users to load collections and objects, rather than just passing in the entire output of list_datasets. It's not feasible to serve the most recent data first: if the user explicitly requests data for a list of relative paths they expect it will be returned in the same order.

If you want to download an entire session without old revisions you can do this:

dsets = one.list_datasets(eid, details=True)
dsets = dsets[dsets['default_revision']]['rel_path'].values
assert 'alf/probe00/pykilosort/clusters.amps.npy' not in dsets