The twist here is the focus on (partial) data access to information in dataset, rather than a (full) description of a dataset. It should enable precise instructions what to obtain, allow for smart decisions on how to obtain it, and all that with a lean data specification.
Decision making: I can declare a download_url for a dataset and use that, but if it is 1gb and I only need a 1mb file from it, a full download is not smart. So that dataset may be a datalad dataset, and we may be able to clone it, and may be able to get that individual file separately.
This means that we need to be able to declare a clone_url in a way that is recognizable. And we should not start declaring additional attributes like clone_url without thinking real hard. Because in no time we will have 1k additional attributes for each special case.
I am thinking to go via QualifiedAccess and have a DataService that is some kind of Git service...
I think this best here is to try to write done a small, clean example of a record, and then get it to be compliant with the schema
Background: https://github.com/datalad/datalad-remake/issues/12
The twist here is the focus on (partial) data access to information in dataset, rather than a (full) description of a dataset. It should enable precise instructions what to obtain, allow for smart decisions on how to obtain it, and all that with a lean data specification.
Decision making: I can declare a
download_url
for a dataset and use that, but if it is 1gb and I only need a 1mb file from it, a full download is not smart. So that dataset may be a datalad dataset, and we may be able to clone it, and may be able to get that individual file separately.This means that we need to be able to declare a
clone_url
in a way that is recognizable. And we should not start declaring additional attributes likeclone_url
without thinking real hard. Because in no time we will have 1k additional attributes for each special case.I am thinking to go via
QualifiedAccess
and have aDataService
that is some kind of Git service...I think this best here is to try to write done a small, clean example of a record, and then get it to be compliant with the schema