Closed CarlinLiao closed 6 months ago
Here's an example configuration file for graph_plugin_plots.py as is.
{
"study": "Melanoma intralesional IL2",
"phenotypes": [
"Tumor",
"Adipocyte or Langerhans cell",
"Nerve",
"B cell",
"Natural killer cell",
"Natural killer T cell",
"CD4+/CD8+ T cell",
"CD4+ natural killer T cell",
"CD4+ regulatory T cell",
"CD4+ T cell",
"CD8+ natural killer T cell",
"CD8+ regulatory T cell",
"CD8+ T cell",
"Double negative regulatory T cell",
"T cell/null phenotype",
"CD163+MHCII- macrophage",
"CD163+MHCII+ macrophage",
"CD68+MHCII- macrophage",
"CD68+MHCII+ macrophage",
"Other macrophage/monocyte CD14+",
"Other macrophage/monocyte CD4+"
],
"attribute_order": [
"Tumor",
"Adipocyte or Langerhans cell",
"Natural killer cell",
"CD4+ T cell",
"Nerve",
"B cell",
"CD4+/CD8+ T cell",
"CD4+ regulatory T cell",
"CD8+ natural killer T cell",
"CD8+ regulatory T cell",
"CD8+ T cell",
"Double negative regulatory T cell",
"T cell/null phenotype",
"Natural killer T cell",
"CD4+ natural killer T cell",
"cohort"
],
"cohorts": [
{
"index_int": 1,
"label": "Non-responder"
},
{
"index_int": 3,
"label": "Responder"
}
],
"plugins": [
"cg-gnn",
"graph-transformer"
],
"figure_size": [
11,
8
],
"orientation": "horizontal"
}
Translating this to an API call,
study
we can ask the user to providephenotypes
can be pulled from the database, butattribute_order
is tricky, since it's up to the user to determine what's a reasonable order for the phenotypes they want to display. Maybe we make the user provide this too? Also, this seems redundant with phenotypes
since it's just specifying the subset of phenotypes to use but I'll need to double-check.cohorts
can be pulled from the database, although it will likely have extra cohorts not used by the GNN that'll hopefully just fall out instead of causing an errorplugins
I think we can ask the user to provide?figure_size
we could try to determine dynamically simply from the number of phenotypes and specimens, but that leaves out how long the longest phenotype's name is... this could be tricky. The values we're using now I only determined using trial and error. Maybe we cache the results so the user can quickly try new values of their own?orientation
is a less complex version of figure_size
What do you think @jimmymathews?
We have to look at this from the web application perspective as well if we go the caching route.
To keep the work here bounded, I propose that we make the new API endpoint have almost no parameters, maybe just the study. We can manually record (in that JSON format, I suppose) the detailed configuration for each study, and get the API handler to consult this configuration when regenerating the plot.
Okay, so for the web API endpoint we expose only the study parameter, but for the actual function and CLI input we expose all parameters. The web API will look up a file or table with the parameters we've fixed for that study and call the function that way. (Where will that file or table be and how will it be looked up?)
Yup, that is what I had in mind.
How about we just add a little self-sufficient database table with study name and JSON blob contents? This way it will be sure to be available to the application.
Similar to the spt db upload-sync-findings
functionality recently added (which takes a local source file and makes its contents available in the DB in a simple way), we could also have spt db upload-sync-gnn-plot-configurations
?
The script upload_sync_findings.py
I mentioned above is here.
It creates an isolated sql table and uses it / syncs it with some local file (local, that is, to the spt-data
repo), which is sort of similar to what we would need.
My thought's been to replace analysis_replication.accesors.DataAccessor
and dependence on the host API with a db_config_file and usage of either FeatureMatrixExtractor
or raw SQL queries, but this is proving more complicated than expected.
With the apiserver, the phenotype counts per specimen is fast because of the encoding you did, but if I'm understanding this correctly that functionality is in ondemand.providers.provider
and not meant for usage outside of that context. (I suppose I could copy the functionality of OnDemandProvider._get_data_array_from_db and CountsProvider.count_structures_of_partial_signed_signature but that doesn't feel like a very modular solution either.)
What would you say is the right way to approach this?
This involves moving the current
analysis_replication/gnn_figure/graph_plugin_plots.py
out ofanalysis_relication/
and intospatialprofilingtoolbox/graphs/
, adding a CLI command for it toscripts/
, and an API endpoint that calls it tospatialprofilingtoolbox/apiserver/app/main.py
.