nadeemlab / SPT

Spatial profiling toolbox for spatial characterization of tumor immune microenvironment in multiplex images (https://oncopathtk.org)
https://oncopathtk.org
Other
21 stars 2 forks source link

Study sets #316

Closed jimmymathews closed 4 months ago

jimmymathews commented 4 months ago

Addresses #312.

The collections concept is meant to allow testing and analysis of a dataset before it is made public, like a private scope.

The names/handles for the datasets are now optionally augmented with a "collection" suffix, as in

Breast cancer IMC collection: xyz123

(The tag/token/label/suffix should match the regular expression ^[a-z0-9\-]{1,513}$).

Major changes

Dataset import. The study.json file, scanned by the tabular import workflow, now optionally has a field "Study collection" with the collection tag as contents, e.g. "xyz123". The workflow was updated to construct the fully-qualified study name from the fields "Study name" and "Study collection". These names are managed by StudyCollectionNaming.

API handlers. By default a collection-tagged study will not be available from the API server, will not show up in the study-names response. However, if you already know the collection's tag/label/token, you can supply it to study-names and receive the fully-qualified study names for that collection (handler). Otherwise the API handlers have not been modified and behave in exactly the same way. That is, you can supply the fully-qualified study name in every place the study name appears as an argument. A major reason to use collection tags embedded in the the study names (rather than extra metadata) was to support this, to allow the API handlers to remain as they are.

Collection status updates. If and when a dataset collection should be made public, one could theoretically update the study name everywhere it appears by stripping off the collection tag. But we have made liberal use of these names as identifiers, making this procedure difficult. Instead, there is now a collection_whitelist table in the default database which contains the tags of collections to be made public. This table is consulted by the study-names handler. A new CLI command is available to manage this procedure, e.g.:

spt db collection --database-config-file=.spt_db.config --collection=xyz123 --publish
spt db collection --database-config-file=.spt_db.config --collection=xyz123 --unpublish

Follow-up changes needed

Update to the web application. The web application should be updated to take advantage of the new study-names behavior, displaying the collection's studies rather than the default-public studies, in case the collection token is provided (e.g. in the URL).

Miscellaneous updates:

Data loaded images. The pre-loaded postgresql docker images are now changing very infrequently, and should not need to be rebuilt regularly on development machines. There is a new make target that pushes the local such images to the remote repository, and the compose.yaml files are updated to allow pulling from the remote.