nadeemlab / SPT

Spatial profiling toolbox for spatial characterization of tumor immune microenvironment in multiplex images (https://oncopathtk.org)
https://oncopathtk.org
Other
21 stars 2 forks source link

add graph plugin plot code #307

Closed CarlinLiao closed 4 months ago

CarlinLiao commented 6 months ago

To close #306.

jimmymathews commented 5 months ago

I'm able to review this now that the DNS issues were resolved for oncopathtk.org, so this script can run.

jimmymathews commented 4 months ago

I did systematic refactoring to break down the process and passed data structures into units, and to give the units approximately meaningful names. What remains is such refactoring inside what is now called SubplotGenerator. I also added caching so that on re-run one doesn't have to wait a few minutes for a new plot.

@CarlinLiao please check that this still works for you as expected. Also, you can finish the refactoring job (if you take a look it should be obvious what is to be done) if you have time. It should take about 1-2 hours. If not, we can merge as-is.

jimmymathews commented 4 months ago

Although the refactoring looks like a rewrite, that is not really true, I did not implement any new behavior (except that possibly it may be considered new behavior that the plot specs are stored in separate JSON files).

CarlinLiao commented 4 months ago

I recognize this error but I don't remember what I usually did to solve it. If I remember correctly, it was something silly like psycopg2 or Postgres being installed. (It is, and my bash and make versions meet requirements.) Running on the server in an SPT conda env.

$ python analysis_replication/gnn_figure/graph_plugin_plots.py -h
Traceback (most recent call last):
  File "/home/liaoc2/SPT/analysis_replication/gnn_figure/graph_plugin_plots.py", line 16, in <module>
    from pandas import DataFrame
  File "/home/liaoc2/miniconda3/envs/spt/lib/python3.11/site-packages/pandas/__init__.py", line 49, in <module>
    from pandas.core.api import (
  File "/home/liaoc2/miniconda3/envs/spt/lib/python3.11/site-packages/pandas/core/api.py", line 47, in <module>
    from pandas.core.groupby import (
  File "/home/liaoc2/miniconda3/envs/spt/lib/python3.11/site-packages/pandas/core/groupby/__init__.py", line 1, in <module>
    from pandas.core.groupby.generic import (
  File "/home/liaoc2/miniconda3/envs/spt/lib/python3.11/site-packages/pandas/core/groupby/generic.py", line 68, in <module>
    from pandas.core.frame import DataFrame
  File "/home/liaoc2/miniconda3/envs/spt/lib/python3.11/site-packages/pandas/core/frame.py", line 149, in <module>
    from pandas.core.generic import (
  File "/home/liaoc2/miniconda3/envs/spt/lib/python3.11/site-packages/pandas/core/generic.py", line 193, in <module>
    from pandas.core.window import (
  File "/home/liaoc2/miniconda3/envs/spt/lib/python3.11/site-packages/pandas/core/window/__init__.py", line 1, in <module>
    from pandas.core.window.ewm import (
  File "/home/liaoc2/miniconda3/envs/spt/lib/python3.11/site-packages/pandas/core/window/ewm.py", line 11, in <module>
    import pandas._libs.window.aggregations as window_aggregations
ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/liaoc2/miniconda3/envs/spt/lib/python3.11/site-packages/pandas/_libs/window/aggregations.cpython-311-x86_64-linux-gnu.so)
jimmymathews commented 4 months ago

Hm this error seems to be triggered by the very first import from pandas: from pandas import DataFrame. Can you confirm that the same thing happens in your environment by doing just this import? E.g. in the Python interactive CLI:

from pandas import DataFrame

If so I guess you'll need to do some environment-specific thing to make pandas available. The pandas occurrence might be a misdirection, however, as there are probably a huge number of functions requiring glibc (I think it is the implementation of the C standard library). So if that is masked or not available somehow in the conda environment, that's not good.

jimmymathews commented 4 months ago

We haven't specified package versions required because these plotting scripts were meant to be dev/testing use only, but perhaps we do need to pin the version because of issues like this.

I think I have been using:

pandas==2.2.1
CarlinLiao commented 4 months ago

Simply updating my pandas and gcc versions in conda weren't enough to fix the issue, so I deleted and remade my SPT condo environment to resolve it, verified that it works with run_all and graph_plugin_plots, and committed its yaml.

The SVG plots appear to have been constructed properly. Here they are for reference:

melanoma_intralesional_il2

urothelial_ici

I also adjusted the tqdm calls to be simpler and complete at 100% instead of being one off. We may consider adding the files created by analysis_replication to .gitignore as well as urothelial.txt.tmp, which is currently version controlled despite its tmp extension.

jimmymathews commented 4 months ago

Noting what we discussed in person:

Probably after refactoring the SubplotGenerator for style conformity/modularity, we should move the main classes into the SPT codebase in order that they may be reused in the live application. It seems that this will also require moving or copying a minimal amount of the "accessors" stuff (and cleaning it up accordingly).

CarlinLiao commented 4 months ago

I thought moving graph_plugin_plots.py into SPT would be simple, but I now see two issues:

  1. it depends on analysis_replication.accessors.DataAccessor, which is not available in SPT
  2. it has two JSON files that describe the specific queries to make that shouldn't be moved into SPT with it because they're specific to a dataset instance

How should we proceed? The simplest option would be to change analysis_replication so that it requires the installation of SPT to run, or at least include the relevant files from SPT the system path.

jimmymathews commented 4 months ago

Let's just postpone promoting this plotting code to SPT codebase. The idea is that eventually we will want to be able to generate these plots server-side in the live application, from something like those PlotSpecification objects (equivalently, those JSONs). But we aren't doing that right away.

The finished refactor looks good to me.

Can you just confirm that the caching thing works for you as well? I deleted my pickles and re-ran graph_plugin_plots.py, and it worked again for me.

CarlinLiao commented 4 months ago

Caching works on my end, so we should be good.

That said, on my WSL machine the plots are coming out a little odd. I checked and I get the same narrow second urothelial plot even if I revert my refactor, so maybe it's just a difference in hardware?

melanoma_intralesional_il2 urothelial_ici

jimmymathews commented 4 months ago

Yeah I wouldn't worry about it, as I remember it matplotlib has a ton of different backend rendering engines that it might use depending on the environment. For the future reproducible workflow we will pin the environment to resolve this.

CarlinLiao commented 4 months ago

Cool. Are we ready to merge?

jimmymathews commented 4 months ago

Yep, tests pass for me.