nadeemlab / SPT

Spatial profiling toolbox for spatial characterization of tumor immune microenvironment in multiplex images
https://oncopathtk.org
Other
21 stars 2 forks source link

`DatabaseConnectionMaker` backward compatibility with `database_config_file` #182

Closed CarlinLiao closed 1 year ago

CarlinLiao commented 1 year ago

Newer SPT modules use the DatabaseConnectionMaker to, well, connect to the database, but older modules take in a database_config_file string directly to create a connection themselves. This leads to unintended consequences when an older module is used in the newer module, as the DatabaseConnectionMaker can't be passed to modules asking for a database_config_file.

I ran into this most recently in #176 here as the new module I'm writing takes in a DatabaseConnectionMaker in its init, but needs to call FeatureMatrixExtractor which uses database_config_file instead.

As I see it, there are a couple possible fixes for this:

CarlinLiao commented 1 year ago

Specifically for FeatureMatrixExtractor, refactoring looks nontrivial. It uses database_config_file multiple times as it runs through the extraction process, with each step using its own subclass of DatabaseConnectionMaker to extract the relevant features. We would need to refactor SparseMatrixPuller, etc. to be able to accept an instance of its own parent class, DatabaseConnectionMaker, in its init in lieu of a database_config_file.

Given its use as both in CLI and in setup, it should probably be written to be compatible with both a string database_config_file and a DatabaseConnectionMaker if that's the direction we want to go.

jimmymathews commented 1 year ago

I think it is not quite as simple as that. DatabaseConnectionMaker and database_config_file are not alternative orthogonal ways to do db access, one contains the other. The things that use database_config_file but not DatabaseConnectionMaker are generally doing so because they exist at a higher abstraction level (CLI scripts that are intermediaries between manual user commands and internal library functions). These will often use DatabaseConnectionMaker somewhere down the chain. This is the case in nearly all the source files containing the symbol database_config_file:

spatialprofilingtoolbox/db/credentials.py
spatialprofilingtoolbox/db/schema_infuser.py
spatialprofilingtoolbox/db/stratification_puller.py
spatialprofilingtoolbox/db/scripts/index_expressions_table.py
spatialprofilingtoolbox/db/scripts/create_schema.py
spatialprofilingtoolbox/db/scripts/modify_constraints.py
spatialprofilingtoolbox/db/scripts/do_fractions_tests.py
spatialprofilingtoolbox/db/scripts/list_studies.py
spatialprofilingtoolbox/db/scripts/retrieve_feature_matrices.py
spatialprofilingtoolbox/db/scripts/drop.py
spatialprofilingtoolbox/db/scripts/status.py
spatialprofilingtoolbox/db/database_connection.py
spatialprofilingtoolbox/db/feature_matrix_extractor.py
spatialprofilingtoolbox/ondemand/scripts/cache_expressions_data_array.py
spatialprofilingtoolbox/workflow/common/sparse_matrix_puller.py
spatialprofilingtoolbox/workflow/common/cli_arguments.py
spatialprofilingtoolbox/workflow/common/core.py
spatialprofilingtoolbox/workflow/common/structure_centroids_puller.py
spatialprofilingtoolbox/workflow/common/job_generator.py
spatialprofilingtoolbox/workflow/phenotype_proximity/integrator.py
spatialprofilingtoolbox/workflow/phenotype_proximity/ondemand.py
spatialprofilingtoolbox/workflow/phenotype_proximity/core.py
spatialprofilingtoolbox/workflow/phenotype_proximity/job_generator.py
spatialprofilingtoolbox/workflow/scripts/create_plots_page.py
spatialprofilingtoolbox/workflow/scripts/configure.py
spatialprofilingtoolbox/workflow/scripts/generate_run_information.py
spatialprofilingtoolbox/workflow/scripts/compute_umaps_all.py
spatialprofilingtoolbox/workflow/tabular_import/initializer.py
spatialprofilingtoolbox/workflow/tabular_import/parsing/skimmer.py
spatialprofilingtoolbox/workflow/reduction_visual/integrator.py
spatialprofilingtoolbox/workflow/reduction_visual/core.py
spatialprofilingtoolbox/workflow/reduction_visual/initializer.py
spatialprofilingtoolbox/workflow/reduction_visual/job_generator.py

So I'm claiming that all the files above with script in the path, or workflow/<workflow name> are probably fine to be left as they are, not possessing some "legacy" pattern.

This leaves:

spatialprofilingtoolbox/db/credentials.py
spatialprofilingtoolbox/db/schema_infuser.py
spatialprofilingtoolbox/db/stratification_puller.py
spatialprofilingtoolbox/db/database_connection.py
spatialprofilingtoolbox/db/feature_matrix_extractor.py
spatialprofilingtoolbox/workflow/common/sparse_matrix_puller.py
spatialprofilingtoolbox/workflow/common/cli_arguments.py
spatialprofilingtoolbox/workflow/common/core.py
spatialprofilingtoolbox/workflow/common/structure_centroids_puller.py

The files credentials and database_connection_maker are just the pattern implementation, so there is nothing to fix there either. cli_arguments and core are script-level related, so they are also OK. This leaves:

spatialprofilingtoolbox/db/schema_infuser.py
spatialprofilingtoolbox/db/stratification_puller.py
spatialprofilingtoolbox/db/feature_matrix_extractor.py
spatialprofilingtoolbox/workflow/common/sparse_matrix_puller.py
spatialprofilingtoolbox/workflow/common/structure_centroids_puller.py

I think all of these 5 modules are a prime target for a rewrite, exactly as you have suggested. schema_infuser subclasses DatabaseConnectionMaker (bad) when it should use the object composition pattern instead. Same with all those pullers.

And same with feature_matrix_extractor, though I don't think it will be too difficult! After the other 4 modules are fixed as well, that is. In feature_matrix_extractor I dogmatically strove to use a functional/stateless pattern, but the parameter passing (as mentioned in another issue) is a big distraction. I think part of this refactor should be to make FeatureMatrixExtractor into a proper stateful object.