psychoinformatics-de / studyforrest-data

DataLad superdataset of all studyforrest.org project dataset components
https://studyforrest.org
8 stars 2 forks source link

Update metadata for studyforrest dataset and subdatasets #60

Open jsheunis opened 2 years ago

jsheunis commented 2 years ago

From a quick scan, it looks like only the 4 subdatasets in studyforrest/original have metadata available, in the form of datacite.yml and BIDS metadata.

For completeness, it would make sense for the super and all subdatasets to have at least some form of minimal metadata (e.g. datacite.yml or studyminimeta.yaml). If there's agreement, I can go ahead and created these files and send PRs to each subdataset repo.

From the perspective of using the SF data as a catalog showcase, it would also be useful to extract BIDS metadata, and have redundant metadata from different formats (in order to demonstrate the catalog's approach to collating metadata). The BIDS extraction provides a useful nudge and practical use case for updating the BIDS metadata extractor. I could take a first shot at this. There's no dependency on this for progress on the SF metadata front, though.

@mih @christian-monch any further comments?

christian-monch commented 2 years ago

I agree that it makes sense to add the studyminimeta.yaml. It will take some time though

jsheunis commented 2 years ago

Since many (all?) of the open datasets part of studyforrest are hosted on GIN, these all have datacite.yml files that were added previously (thanks, @christian-monch) in order to generate a DOI. These files have essentially the same information that would go into the studyminimeta file. I'm working on a datacite_gin extractor to extract this information, which should not only be useful for the studyforrest dataset, but any datalad datasets hosted on GIN with the datacite.yml file. More comments on datacite here.