saeyslab / napari-sparrow

Other
17 stars 0 forks source link

Refactor functions.py #121

Closed SilverViking closed 10 months ago

SilverViking commented 1 year ago

Before we make the sparrow pipeline public, some refactoring of the code is advisable.

The monolithic functions.py should be split up into a couple of different modules, perhaps organized into a few subpackages.

Some possible modules:

If we want to fit in the scverse ecosystem, it would be wise if we adopted similar naming conventions for the packages. We could perhaps adopt scanpy's and squidpy's naming convention: pp for preprocessing, pl for plotting, tl for tools, etc.

SilverViking commented 1 year ago

We propose to split the functionality in the current monolothic "functions.py" over the following new napari-sparrow Python (sub-)packages:

------------------+--------------+----------------------------------------------------
Full package name | Abbreviation | Package purpose
------------------+--------------+----------------------------------------------------
io                | io           | reading and writing
------------------+--------------+----------------------------------------------------
image             | im           | image related operations
------------------+--------------+----------------------------------------------------
table             | tb           | operations acting mostly on the count matrix
                  |              | Note: "tools" could be another option, which is 
                  |              | what scanpy uses, but tools is also very general as a name
------------------+--------------+----------------------------------------------------
shape             | sh           | operations on "shapes" (segmentation contours etc.)
------------------+--------------+----------------------------------------------------
plot              | pl           | all plotting functionality
------------------+--------------+----------------------------------------------------
utils             | utils        | miscellaneous functions; should ideally be empty;
                  |              | currently used for functions which may be removed
                  |              | altogether or moved to a different package later
------------------+--------------+----------------------------------------------------

These sub-packages resides inside the napari-sparrow package. Accessing functions will then be done as follows:

import napari-sparrow as nas     # should we suggest using the "nas" abbreviation, or "ns" ?
sdata = nas.io.create_sdata(...)

Overview of the renamed functions in their new sparrow packages:

---------------------------------+---------------------------------------------+
Old name in functions.py         | New name in package structure               |
---------------------------------+---------------------------------------------+
read_resolve_transcripts         | io.read_resolve_transcripts                 |
read_stereoseq_transcripts       | io.read_stereoseq_transcripts               |
read_vizgen_transcripts          | io.read_vizgen_transcripts                  |
read_transcripts                 | io.read_transcripts                         |
create_sdata                     | io.create_sdata                             |
load_image_to_dask               | io._load_image_to_dask                      |
_add_transcripts_to_sdata        | io._add_transcripts_to_sdata                |
---------------------------------+---------------------------------------------+
tilingCorrection                 | im.tiling_correction                        |
tophat_filtering                 | im.tophat_filtering                         | TODO: rename function, is not a tophat filter anymore (perhaps call it enhance_image or enhance_background or remove_noise)
clahe_processing                 | im.enhance_contrast                         |
segmentation_cellpose            | im.segment(method='cellpose')               |
cellpose                         | im._cellpose                                |
control_transcripts              | im.transcript_density                       |
_get_image_boundary              | im._get_image_boundary                      |
_get_translation                 | im._get_translation                         |
_apply_transform                 | im._apply_transform                         |
_unapply_transform               | im._unapply_transform                       |
---------------------------------+---------------------------------------------+
preprocessAdata                  | tb.preprocess_anndata    (ugly name)        |
allocation                       | tb.allocate                                 |
scoreGenes                       | tb.score_genes                              |
filter_on_size                   | tb.filter_on_size                           |
remove_celltypes                 | tb._remove_celltypes                        |
annotate_maxscore                | tb._annotate_maxscore                       |
enrichment                       | tb.nhood_enrichment                         |
clustering                       | tb.cluster                                  |
clustercleanliness               | tb.cluster_cleanliness                      | TODO: refactor this function?
_annotate_celltype               | tb._annotate_celltype                       |
_back_sdata_table_to_zarr        | tb._back_sdata_table_to_zarr                |
correct_marker_genes             | tb.correct_marker_genes                     | 
---------------------------------+---------------------------------------------+
tilingCorrectionPlot             | pl.tiling_correction                        |
segmentationPlot                 | pl.segment                                  | TODO: unify with pl.plot...() / plotShapes()
scoreGenesPlot                   | pl.score_genes                              |
preprocesAdataPlot               | pl.preprocess_anndata                       |
enrichment_plot                  | pl.nhood_enrichment                         |
clustering_plot                  | pl.cluster                                  |
clustercleanlinessPlot           | pl.cluster_cleanliness                      | TODO: check if this works without first calling _utils.cluster_cleanliness
plot_control_transcripts         | pl.transcript_density                       |
plot_shapes                      | pl.plot_shapes                              | TODO: clean up API / add easier to use functions for dedicated plots ?
plot_image_container             | pl.plot_image                               | TODO: probably remove (merge with plot_shapes), it also does not plot ImageContainer anymore so name is wrong
sanity_plot_transcripts_matrix   | pl.sanity_plot_transcripts_matrix           | TODO: find nicer name!
---------------------------------+---------------------------------------------+
create_voronoi_boundaries        | sh.cell_expansion(method='voronoi')         |
overlapping_region_2D            | sh.intersect_rectangles                     |
delete_overlap                   | sh._delete_overlap                          |
mask_to_polygons_layer_dask      | sh._mask_image_to_polygons                  |
---------------------------------+---------------------------------------------+
analyse_genes_left_out           | utils.analyse_genes_left_out                | TODO: this calculates and plots, split the function up + move to better package
extract                          | utils.extract()                             | TODO: check if still needed, and otherwise remove altogether. If needed move to im.calculate_image_features (zoals sq.im.calculate_image_features) of im.extract_features ?
border_color                     | utils.border_color                          | TODO: check if can be removed
color                            | utils.color                                 | TODO: check if can be removed
linewidth                        | utils.linewidth                             | TODO: check if can be removed
---------------------------------+---------------------------------------------+
read_in_RESOLVE                  | REMOVE (= read_resolve_transcripts)         |
read_in_Vizgen                   | REMOVE                                      |
read_in_stereoSeq                | REMOVE                                      |
micron_to_pixels                 | REMOVE                                      |
read_in_zarr_from_path           | REMOVE                                      |
write_to_zarr                    | REMOVE                                      |
imageContainerToSData            | REMOVE                                      |
mask_to_polygons_layer           | REMOVE                                      |
---------------------------------+---------------------------------------------+

Additional TODOs:

References: