The collections concept is meant to allow testing and analysis of a dataset before it is made public, like a private scope.
The names/handles for the datasets are now optionally augmented with a "collection" suffix, as in
Breast cancer IMC collection: xyz123
(The tag/token/label/suffix should match the regular expression ^[a-z0-9\-]{1,513}$).
Major changes
Dataset import. The study.json file, scanned by the tabular import workflow, now optionally has a field "Study collection" with the collection tag as contents, e.g. "xyz123". The workflow was updated to construct the fully-qualified study name from the fields "Study name" and "Study collection". These names are managed by StudyCollectionNaming.
API handlers. By default a collection-tagged study will not be available from the API server, will not show up in the study-names response. However, if you already know the collection's tag/label/token, you can supply it to study-names and receive the fully-qualified study names for that collection (handler). Otherwise the API handlers have not been modified and behave in exactly the same way. That is, you can supply the fully-qualified study name in every place the study name appears as an argument. A major reason to use collection tags embedded in the the study names (rather than extra metadata) was to support this, to allow the API handlers to remain as they are.
Collection status updates. If and when a dataset collection should be made public, one could theoretically update the study name everywhere it appears by stripping off the collection tag. But we have made liberal use of these names as identifiers, making this procedure difficult. Instead, there is now a collection_whitelist table in the default database which contains the tags of collections to be made public. This table is consulted by the study-names handler. A new CLI command is available to manage this procedure, e.g.:
spt db collection --database-config-file=.spt_db.config --collection=xyz123 --publish
spt db collection --database-config-file=.spt_db.config --collection=xyz123 --unpublish
Follow-up changes needed
Update to the web application. The web application should be updated to take advantage of the new study-names behavior, displaying the collection's studies rather than the default-public studies, in case the collection token is provided (e.g. in the URL).
Miscellaneous updates:
Data loaded images. The pre-loaded postgresql docker images are now changing very infrequently, and should not need to be rebuilt regularly on development machines. There is a new make target that pushes the local such images to the remote repository, and the compose.yaml files are updated to allow pulling from the remote.
Addresses #312.
The collections concept is meant to allow testing and analysis of a dataset before it is made public, like a private scope.
The names/handles for the datasets are now optionally augmented with a "collection" suffix, as in
(The tag/token/label/suffix should match the regular expression
^[a-z0-9\-]{1,513}$
).Major changes
Dataset import. The
study.json
file, scanned by the tabular import workflow, now optionally has a field"Study collection"
with the collection tag as contents, e.g."xyz123"
. The workflow was updated to construct the fully-qualified study name from the fields"Study name"
and"Study collection"
. These names are managed by StudyCollectionNaming.API handlers. By default a collection-tagged study will not be available from the API server, will not show up in the
study-names
response. However, if you already know the collection's tag/label/token, you can supply it tostudy-names
and receive the fully-qualified study names for that collection (handler). Otherwise the API handlers have not been modified and behave in exactly the same way. That is, you can supply the fully-qualified study name in every place the study name appears as an argument. A major reason to use collection tags embedded in the the study names (rather than extra metadata) was to support this, to allow the API handlers to remain as they are.Collection status updates. If and when a dataset collection should be made public, one could theoretically update the study name everywhere it appears by stripping off the collection tag. But we have made liberal use of these names as identifiers, making this procedure difficult. Instead, there is now a
collection_whitelist
table in the default database which contains the tags of collections to be made public. This table is consulted by thestudy-names
handler. A new CLI command is available to manage this procedure, e.g.:Follow-up changes needed
Update to the web application. The web application should be updated to take advantage of the new
study-names
behavior, displaying the collection's studies rather than the default-public studies, in case the collection token is provided (e.g. in the URL).Miscellaneous updates:
Data loaded images. The pre-loaded postgresql docker images are now changing very infrequently, and should not need to be rebuilt regularly on development machines. There is a new make target that pushes the local such images to the remote repository, and the
compose.yaml
files are updated to allow pulling from the remote.