Make GNN plots available as an SPT CLI command and API endpoint

CarlinLiao commented 6 months ago

This involves moving the current analysis_replication/gnn_figure/graph_plugin_plots.py out of analysis_relication/ and into spatialprofilingtoolbox/graphs/, adding a CLI command for it to scripts/, and an API endpoint that calls it to spatialprofilingtoolbox/apiserver/app/main.py.

CarlinLiao commented 6 months ago

Here's an example configuration file for graph_plugin_plots.py as is.

{
    "study": "Melanoma intralesional IL2",
    "phenotypes": [
        "Tumor",
        "Adipocyte or Langerhans cell",
        "Nerve",
        "B cell",
        "Natural killer cell",
        "Natural killer T cell",
        "CD4+/CD8+ T cell",
        "CD4+ natural killer T cell",
        "CD4+ regulatory T cell",
        "CD4+ T cell",
        "CD8+ natural killer T cell",
        "CD8+ regulatory T cell",
        "CD8+ T cell",
        "Double negative regulatory T cell",
        "T cell/null phenotype",
        "CD163+MHCII- macrophage",
        "CD163+MHCII+ macrophage",
        "CD68+MHCII- macrophage",
        "CD68+MHCII+ macrophage",
        "Other macrophage/monocyte CD14+",
        "Other macrophage/monocyte CD4+"
    ],
    "attribute_order": [
        "Tumor",
        "Adipocyte or Langerhans cell",
        "Natural killer cell",
        "CD4+ T cell",
        "Nerve",
        "B cell",
        "CD4+/CD8+ T cell",
        "CD4+ regulatory T cell",
        "CD8+ natural killer T cell",
        "CD8+ regulatory T cell",
        "CD8+ T cell",
        "Double negative regulatory T cell",
        "T cell/null phenotype",
        "Natural killer T cell",
        "CD4+ natural killer T cell",
        "cohort"
    ],
    "cohorts": [
        {
            "index_int": 1,
            "label": "Non-responder"
        },
        {
            "index_int": 3,
            "label": "Responder"
        }
    ],
    "plugins": [
        "cg-gnn",
        "graph-transformer"
    ],
    "figure_size": [
        11,
        8
    ],
    "orientation": "horizontal"
}

Translating this to an API call,

study we can ask the user to provide
phenotypes can be pulled from the database, but
attribute_order is tricky, since it's up to the user to determine what's a reasonable order for the phenotypes they want to display. Maybe we make the user provide this too? Also, this seems redundant with phenotypes since it's just specifying the subset of phenotypes to use but I'll need to double-check.
cohorts can be pulled from the database, although it will likely have extra cohorts not used by the GNN that'll hopefully just fall out instead of causing an error
plugins I think we can ask the user to provide?
figure_size we could try to determine dynamically simply from the number of phenotypes and specimens, but that leaves out how long the longest phenotype's name is... this could be tricky. The values we're using now I only determined using trial and error. Maybe we cache the results so the user can quickly try new values of their own?
orientation is a less complex version of figure_size

What do you think @jimmymathews?

CarlinLiao commented 6 months ago

We have to look at this from the web application perspective as well if we go the caching route.

jimmymathews commented 6 months ago

To keep the work here bounded, I propose that we make the new API endpoint have almost no parameters, maybe just the study. We can manually record (in that JSON format, I suppose) the detailed configuration for each study, and get the API handler to consult this configuration when regenerating the plot.

CarlinLiao commented 6 months ago

Okay, so for the web API endpoint we expose only the study parameter, but for the actual function and CLI input we expose all parameters. The web API will look up a file or table with the parameters we've fixed for that study and call the function that way. (Where will that file or table be and how will it be looked up?)

jimmymathews commented 6 months ago

Yup, that is what I had in mind. How about we just add a little self-sufficient database table with study name and JSON blob contents? This way it will be sure to be available to the application. Similar to the spt db upload-sync-findings functionality recently added (which takes a local source file and makes its contents available in the DB in a simple way), we could also have spt db upload-sync-gnn-plot-configurations ?

jimmymathews commented 6 months ago

The script upload_sync_findings.pyI mentioned above is here.

It creates an isolated sql table and uses it / syncs it with some local file (local, that is, to the spt-data repo), which is sort of similar to what we would need.

CarlinLiao commented 6 months ago

My thought's been to replace analysis_replication.accesors.DataAccessor and dependence on the host API with a db_config_file and usage of either FeatureMatrixExtractor or raw SQL queries, but this is proving more complicated than expected.

With the apiserver, the phenotype counts per specimen is fast because of the encoding you did, but if I'm understanding this correctly that functionality is in ondemand.providers.provider and not meant for usage outside of that context. (I suppose I could copy the functionality of OnDemandProvider._get_data_array_from_db and CountsProvider.count_structures_of_partial_signed_signature but that doesn't feel like a very modular solution either.)

What would you say is the right way to approach this?

nadeemlab / SPT

Make GNN plots available as an SPT CLI command and API endpoint #319