Closed CarlinLiao closed 11 months ago
I think I misunderstand something here. No matter how small, each specimen should not be "deemed too small" to create a graph. Surely it is easy enough to create 1 graph in that case. (Do we actually have such a case?)
This is intentional, as graphs are not created when specimens are too small, as they don't provide enough information to train a good model. This is the same reason why we don't run cggnn on datasets with smaller slides, like CyTOF or breast cancer IMC.
None of the slides in the urothelial dataset are too small. The smallest that occurs is 2724 cells.
No, we should still be able to run the pipeline in the presence of the "smaller slides" from those datasets. The performance is a separate matter, it does not justify restricting our implementation to the best case scenarios only.
I set the cells_per_slide_target
(actually, this is a misnomer and should be changed to cells_per_ROI_target
or something) that we've been using to 5000. If I recall correctly this was a value I arrived at after some observation of which datasets cg-gnn worked well for and which it didn't, after I switched from setting a predetermined ROI size to having it be determined dynamically by average cell density across specimens + a target number of cells I wanted to hit per ROI.
But yes, these are probably two separate issues:
Let's open up a separate issue for the latter, or an email/in-person talking thread if we determine it's unrelated to the SPT codebase.
When running on a study with very large and very small specimens (e.g., urothelial), sometimes specimens will be deemed too small to create a specimen for. When that happens,
spt cggnn create-specimen-graphs
will error because DGL will try to save an empty list of graphs.A simple but insufficient fix is to change
create-specimen-graphs
so it doesn't try to save graphs when there are no graphs created from a specimen, but this results in a Nextflow error as it's expecting a graph data artifact to be created, causing an error.Possible solutions:
create-specimen-graphs
upstream toprepare-graph-creation
. Not my preferred strategy since deciding how many graphs are created happens concurrently with graph creation, but it's possible.create-specimen-graphs
may not create any output.