sfb1451 / metadata-catalog

The SFB 1451 data portal (metadata catalog)
https://data.sfb1451.de
0 stars 1 forks source link

BIDS metadata lists subject IDs #32

Closed mih closed 7 months ago

mih commented 1 year ago

Here is an example (move to metadata tab): https://psychoinformatics-de.github.io/sfb1451-projects-catalog/#/dataset/a808f9e1-d638-4771-ae4c-ff21da29151a/272e2844cd7c58dda8dfe288e7831ad2fa9c34b6

While there can be a purpose for communicating subject identifiers, I think this is a tricky thing to do by default.

In this concrete case, it seems to only add noise to the record.

In general, subject identifiers are not necessarily guaranteed to be anonymous, hence showing them by default poses a data protection risk. In this particular care the dataset is already public, so this is a non-issue -- but in the general case we cannot assume that.

I understand that the inclusion of this dataset is the outcome of a manual "white-listing", but it may be sensible to think about a general mechanism for such cases.

jsheunis commented 1 year ago

Do you think this is should be in the domain of the metadata source, or the catalog itself? If the latter, we would have to have some (semantic?) way of recognizing that a subject identifier is indeed something worth being sensitive with.

mslw commented 1 year ago

BIDS metadata is reported as produced by the bids_dataset metadata extractor from https://github.com/datalad/datalad-neuroimaging

In general I agree with the comments (IMO this is fully in the domain of the metadata source, or even catalog curator).

I also think there is not much to do here, short of creating a new extractor (metadata extraction procedure) that would return a different set of information, or summarizing the extractor information somehow when going from metalad to catalog (e.g. only report len(subjects)).

Given that we don't have non-public BIDS datasets in the catalog, I would suggest closing this issue as a wontfix and keeping it as a caution for the future, should we receive requests to include such datasets.

mslw commented 7 months ago

I think my comment above applies. Leakage of information needs to be prevented by screening the generated catalog page before publishing, which I do anyway.

Also, note that for BIDS, the subject IDs are also present in file names, so the same would apply to file listing, not just BIDS metadata tab.