Replace cggnn distance matrix with KDTree

nadeemlab / SPT

Spatial profiling toolbox for spatial characterization of tumor immune microenvironment in multiplex images

https://oncopathtk.org

Other

21 stars 2 forks source link

Replace cggnn distance matrix with KDTree #256

Closed CarlinLiao closed 12 months ago

CarlinLiao commented 1 year ago

To resolve #255, I've replaced the square distance matrix in cggnn graph generation with a KDTree implementation. Instead of finding the cell with the highest density of cells close by based on the distance matrix, this implementation instead finds the cell with the most cells nearby in a radius equal to that of half the ROI's side length using the KDTree.

Some caveats:

This does not replicate the old method of ROI creation, it just uses a faster approximation method.
The radius search is a circle while the actual ROI created is a square.

All this said, the old implementation was also approximate method, an even more naive one at that, so I don't think we need to sweat these caveats too much.

CarlinLiao commented 1 year ago

All tests pass

jimmymathews commented 1 year ago

Since we were not able to create a sufficiently reproducible quantitative checksum summary of the gnn behavior, it's hard to know if tests passing means that the functionality is equivalent or at least not broken by this change.

It seems that test/test_data/gnn_importances/1.csv and 2.csv etc. are not used... in test/workflow/module_tests/test_cggnn.sh we just check the expected average of the importance scores within some tolerance.

Before merging this change, perhaps we should add a test that checks that the set of top 100 important cells always has, say, at least 50% overlap with one fixed reference run?

CarlinLiao commented 1 year ago

That sounds like a good idea.

CarlinLiao commented 1 year ago

Although given that there are only about 200 cells in the test 1 small dataset, this might end up being an over or underdetermined problem.

CarlinLiao commented 12 months ago

Using the current main branch, I ran the workflow cggnn test three times and observed that they shared about 70/100 of the most important histological structures.

Using this branch, I observed that 60/100 of the most important structures were shared between this run and one of the main branch runs.

CarlinLiao commented 12 months ago

Testing the rest of the package before committing a final edit to the test...

By the way, whenever I have verbose on I see this "error" when readying a container or starting a test, although it doesn't break anything or cause any actual halting errors. Do you know about it?

can't find package Expect
    while executing
"package require Expect"
    (file "/usr/bin/unbuffer" line 6)
can't find package Expect
    while executing
"package require Expect"
    (file "/usr/bin/unbuffer" line 6)
can't find package Expect
    while executing
"package require Expect"
    (file "/usr/bin/unbuffer" line 6)

CarlinLiao commented 12 months ago

All tests pass on my machine.