Closed CarlinLiao closed 1 year ago
Specifically for FeatureMatrixExtractor
, refactoring looks nontrivial. It uses database_config_file
multiple times as it runs through the extraction process, with each step using its own subclass of DatabaseConnectionMaker
to extract the relevant features. We would need to refactor SparseMatrixPuller
, etc. to be able to accept an instance of its own parent class, DatabaseConnectionMaker
, in its init in lieu of a database_config_file
.
Given its use as both in CLI and in setup, it should probably be written to be compatible with both a string database_config_file
and a DatabaseConnectionMaker
if that's the direction we want to go.
I think it is not quite as simple as that. DatabaseConnectionMaker
and database_config_file
are not alternative orthogonal ways to do db access, one contains the other. The things that use database_config_file
but not DatabaseConnectionMaker
are generally doing so because they exist at a higher abstraction level (CLI scripts that are intermediaries between manual user commands and internal library functions). These will often use DatabaseConnectionMaker
somewhere down the chain. This is the case in nearly all the source files containing the symbol database_config_file
:
spatialprofilingtoolbox/db/credentials.py
spatialprofilingtoolbox/db/schema_infuser.py
spatialprofilingtoolbox/db/stratification_puller.py
spatialprofilingtoolbox/db/scripts/index_expressions_table.py
spatialprofilingtoolbox/db/scripts/create_schema.py
spatialprofilingtoolbox/db/scripts/modify_constraints.py
spatialprofilingtoolbox/db/scripts/do_fractions_tests.py
spatialprofilingtoolbox/db/scripts/list_studies.py
spatialprofilingtoolbox/db/scripts/retrieve_feature_matrices.py
spatialprofilingtoolbox/db/scripts/drop.py
spatialprofilingtoolbox/db/scripts/status.py
spatialprofilingtoolbox/db/database_connection.py
spatialprofilingtoolbox/db/feature_matrix_extractor.py
spatialprofilingtoolbox/ondemand/scripts/cache_expressions_data_array.py
spatialprofilingtoolbox/workflow/common/sparse_matrix_puller.py
spatialprofilingtoolbox/workflow/common/cli_arguments.py
spatialprofilingtoolbox/workflow/common/core.py
spatialprofilingtoolbox/workflow/common/structure_centroids_puller.py
spatialprofilingtoolbox/workflow/common/job_generator.py
spatialprofilingtoolbox/workflow/phenotype_proximity/integrator.py
spatialprofilingtoolbox/workflow/phenotype_proximity/ondemand.py
spatialprofilingtoolbox/workflow/phenotype_proximity/core.py
spatialprofilingtoolbox/workflow/phenotype_proximity/job_generator.py
spatialprofilingtoolbox/workflow/scripts/create_plots_page.py
spatialprofilingtoolbox/workflow/scripts/configure.py
spatialprofilingtoolbox/workflow/scripts/generate_run_information.py
spatialprofilingtoolbox/workflow/scripts/compute_umaps_all.py
spatialprofilingtoolbox/workflow/tabular_import/initializer.py
spatialprofilingtoolbox/workflow/tabular_import/parsing/skimmer.py
spatialprofilingtoolbox/workflow/reduction_visual/integrator.py
spatialprofilingtoolbox/workflow/reduction_visual/core.py
spatialprofilingtoolbox/workflow/reduction_visual/initializer.py
spatialprofilingtoolbox/workflow/reduction_visual/job_generator.py
So I'm claiming that all the files above with script
in the path, or workflow/<workflow name>
are probably fine to be left as they are, not possessing some "legacy" pattern.
This leaves:
spatialprofilingtoolbox/db/credentials.py
spatialprofilingtoolbox/db/schema_infuser.py
spatialprofilingtoolbox/db/stratification_puller.py
spatialprofilingtoolbox/db/database_connection.py
spatialprofilingtoolbox/db/feature_matrix_extractor.py
spatialprofilingtoolbox/workflow/common/sparse_matrix_puller.py
spatialprofilingtoolbox/workflow/common/cli_arguments.py
spatialprofilingtoolbox/workflow/common/core.py
spatialprofilingtoolbox/workflow/common/structure_centroids_puller.py
The files credentials
and database_connection_maker
are just the pattern implementation, so there is nothing to fix there either. cli_arguments
and core
are script-level related, so they are also OK. This leaves:
spatialprofilingtoolbox/db/schema_infuser.py
spatialprofilingtoolbox/db/stratification_puller.py
spatialprofilingtoolbox/db/feature_matrix_extractor.py
spatialprofilingtoolbox/workflow/common/sparse_matrix_puller.py
spatialprofilingtoolbox/workflow/common/structure_centroids_puller.py
I think all of these 5 modules are a prime target for a rewrite, exactly as you have suggested. schema_infuser
subclasses DatabaseConnectionMaker
(bad) when it should use the object composition pattern instead. Same with all those puller
s.
And same with feature_matrix_extractor
, though I don't think it will be too difficult! After the other 4 modules are fixed as well, that is. In feature_matrix_extractor
I dogmatically strove to use a functional/stateless pattern, but the parameter passing (as mentioned in another issue) is a big distraction. I think part of this refactor should be to make FeatureMatrixExtractor
into a proper stateful object.
Newer SPT modules use the
DatabaseConnectionMaker
to, well, connect to the database, but older modules take in adatabase_config_file
string directly to create a connection themselves. This leads to unintended consequences when an older module is used in the newer module, as theDatabaseConnectionMaker
can't be passed to modules asking for adatabase_config_file
.I ran into this most recently in #176 here as the new module I'm writing takes in a
DatabaseConnectionMaker
in its init, but needs to callFeatureMatrixExtractor
which usesdatabase_config_file
instead.As I see it, there are a couple possible fixes for this:
DatabaseConnectionMaker
patterndatabase_config_file
onlyDatabaseConnectionMaker
that stores thedatabase_config_file
used to create the maker