nadeemlab / SPT

Spatial profiling toolbox for spatial characterization of tumor immune microenvironment in multiplex images (https://oncopathtk.org)
https://oncopathtk.org
Other
21 stars 2 forks source link

Make GNN plots available as an SPT CLI command and API endpoint #319

Closed CarlinLiao closed 3 months ago

CarlinLiao commented 4 months ago

This involves moving the current analysis_replication/gnn_figure/graph_plugin_plots.py out of analysis_relication/ and into spatialprofilingtoolbox/graphs/, adding a CLI command for it to scripts/, and an API endpoint that calls it to spatialprofilingtoolbox/apiserver/app/main.py.

CarlinLiao commented 4 months ago

Here's an example configuration file for graph_plugin_plots.py as is.

{
    "study": "Melanoma intralesional IL2",
    "phenotypes": [
        "Tumor",
        "Adipocyte or Langerhans cell",
        "Nerve",
        "B cell",
        "Natural killer cell",
        "Natural killer T cell",
        "CD4+/CD8+ T cell",
        "CD4+ natural killer T cell",
        "CD4+ regulatory T cell",
        "CD4+ T cell",
        "CD8+ natural killer T cell",
        "CD8+ regulatory T cell",
        "CD8+ T cell",
        "Double negative regulatory T cell",
        "T cell/null phenotype",
        "CD163+MHCII- macrophage",
        "CD163+MHCII+ macrophage",
        "CD68+MHCII- macrophage",
        "CD68+MHCII+ macrophage",
        "Other macrophage/monocyte CD14+",
        "Other macrophage/monocyte CD4+"
    ],
    "attribute_order": [
        "Tumor",
        "Adipocyte or Langerhans cell",
        "Natural killer cell",
        "CD4+ T cell",
        "Nerve",
        "B cell",
        "CD4+/CD8+ T cell",
        "CD4+ regulatory T cell",
        "CD8+ natural killer T cell",
        "CD8+ regulatory T cell",
        "CD8+ T cell",
        "Double negative regulatory T cell",
        "T cell/null phenotype",
        "Natural killer T cell",
        "CD4+ natural killer T cell",
        "cohort"
    ],
    "cohorts": [
        {
            "index_int": 1,
            "label": "Non-responder"
        },
        {
            "index_int": 3,
            "label": "Responder"
        }
    ],
    "plugins": [
        "cg-gnn",
        "graph-transformer"
    ],
    "figure_size": [
        11,
        8
    ],
    "orientation": "horizontal"
}

Translating this to an API call,

What do you think @jimmymathews?

CarlinLiao commented 4 months ago

We have to look at this from the web application perspective as well if we go the caching route.

jimmymathews commented 4 months ago

To keep the work here bounded, I propose that we make the new API endpoint have almost no parameters, maybe just the study. We can manually record (in that JSON format, I suppose) the detailed configuration for each study, and get the API handler to consult this configuration when regenerating the plot.

CarlinLiao commented 4 months ago

Okay, so for the web API endpoint we expose only the study parameter, but for the actual function and CLI input we expose all parameters. The web API will look up a file or table with the parameters we've fixed for that study and call the function that way. (Where will that file or table be and how will it be looked up?)

jimmymathews commented 4 months ago

Yup, that is what I had in mind. How about we just add a little self-sufficient database table with study name and JSON blob contents? This way it will be sure to be available to the application. Similar to the spt db upload-sync-findings functionality recently added (which takes a local source file and makes its contents available in the DB in a simple way), we could also have spt db upload-sync-gnn-plot-configurations ?

jimmymathews commented 4 months ago

The script upload_sync_findings.pyI mentioned above is here.

It creates an isolated sql table and uses it / syncs it with some local file (local, that is, to the spt-data repo), which is sort of similar to what we would need.

CarlinLiao commented 4 months ago

My thought's been to replace analysis_replication.accesors.DataAccessor and dependence on the host API with a db_config_file and usage of either FeatureMatrixExtractor or raw SQL queries, but this is proving more complicated than expected.

With the apiserver, the phenotype counts per specimen is fast because of the encoding you did, but if I'm understanding this correctly that functionality is in ondemand.providers.provider and not meant for usage outside of that context. (I suppose I could copy the functionality of OnDemandProvider._get_data_array_from_db and CountsProvider.count_structures_of_partial_signed_signature but that doesn't feel like a very modular solution either.)

What would you say is the right way to approach this?