psychoinformatics-de / datalad-concepts

Other
3 stars 2 forks source link

Example of a data provisioning specification #174

Closed mih closed 5 days ago

mih commented 7 months ago

Background: https://github.com/datalad/datalad-remake/issues/12

The twist here is the focus on (partial) data access to information in dataset, rather than a (full) description of a dataset. It should enable precise instructions what to obtain, allow for smart decisions on how to obtain it, and all that with a lean data specification.

Decision making: I can declare a download_url for a dataset and use that, but if it is 1gb and I only need a 1mb file from it, a full download is not smart. So that dataset may be a datalad dataset, and we may be able to clone it, and may be able to get that individual file separately.

This means that we need to be able to declare a clone_url in a way that is recognizable. And we should not start declaring additional attributes like clone_url without thinking real hard. Because in no time we will have 1k additional attributes for each special case.

I am thinking to go via QualifiedAccess and have a DataService that is some kind of Git service...

I think this best here is to try to write done a small, clean example of a record, and then get it to be compliant with the schema