saeyslab / napari-sparrow

Other
17 stars 0 forks source link

take into account subset when analyzing genes filtered out #111

Closed lopollar closed 1 year ago

lopollar commented 1 year ago

When performing the analysis of how many genes are filtered out, we need to only compare to the transcripts in the subset. If not, the values will be uninterpretable.

ArneDefauw commented 1 year ago

could you specify where in the code we compare the number of genes filtered out to the total number of transcripts? Because the transcripts are now saved in the sdata object as dask dataframe this should anyway be easy to implement

lopollar commented 1 year ago

super easy! I think the full five minutes, just where we do the plot_filtered genes, we print some text, add this text to it!

ArneDefauw commented 1 year ago

fixed by https://github.com/saeyslab/napari-sparrow/commit/af7845d0b9025504615e4b959c73984f8a608d3b

Changed API slightly. From:

import napari_sparrow as nas
filtered = nas.utils.analyse_genes_left_out(sdata, sdata[ 'transcripts' ].compute())

to

filtered=nas.pl.analyse_genes_left_out( sdata, )

updated function now uses dask to do the groupby, which is more efficient for large transcripts files (e.g. vizgen).

If segmentation was run on a crop (and sdata.table does contains counts obtained using the mask layers obtained from this crop), we take this into account by querying the sdata.points['transcripts'] dask dataframe before analysis.