fix the plot_control transcripts for large images

lopollar commented 1 year ago

for the large image, too many rows need to be combined.

Maybe we can subsample, by pulling locations together, and creating sums over 25 pixels (5*5). This will reduce the plotting a lot.

Zoom-ins should be possible, we should put a threshold (or a hidden parameter) to state how big an image can be to still plot it fully!

Question: do we save this image in the object? I don't think that is necessary at the moment, but maybe there is a good reason to do so (i.g. visualization in napari)

ArneDefauw commented 1 year ago

This is partially fixed by https://github.com/saeyslab/napari-sparrow/commit/bfd595738c95c59e75570d2b481a6a5fd0bb52ed.

Currently solved as follows (e.g. for VIZGEN):

possibility to give crd when calculating transcript density: e.g.:

import napari_sparrow as nas
sdata=nas.im.transcript_density(  sdata, crd= [ 20000, 30000, 20000, 30000 ])

the calculated transcript density for this crd is stored in the sdata object as an image layer (default name='transcript_density') (and in the .zarr file if backed by zarr).

this layer can then be plotted by the nas.pl.plot_shapes function, or pl.plot_image, as one would plot other image layers. Or one can use nas.pl.transcript_density, e.g.:

nas.pl.transcript_density( sdata, img_layer=[ 'clahe', 'transcript_density' ] crd=[ 20000, 30000, 20000, 30000 ] ).

We should decide if we want to have a look at 'subsampeling by pulling location together' in the nas.im.transcript_density function, to be able to calculate transcript density for large images , as you suggested Lotte.

ArneDefauw commented 1 year ago

Fixed a bug in (commit ( https://github.com/saeyslab/napari-sparrow/commit/c82c236f7d780cdb3e00a4d6113ff5414db24336 ))

import napari_sparrow as nas
sdata=nas.im.transcript_density(  sdata, ...)

the use of .unstack in:

image = np.array(counts_location_transcript.unstack(fill_value=0))

did not automatically result in an image with dimensions similar as the other image layers, because not all rows or columns are populated with transcripts.

Rewrote code to fix this + added option to sample transcripts before calculation of transcript density. Default behaviour is now that sampling is performed if number of transcripts inside the crd > 15 000 000. Everything is now rewritten in dask/dask dataframe, so even for vizgen, it is relatively fast to calculate density (approx 3min if we set leave the n_sample parameter to the default value of 15 000 000) and crd to None:

sdata=nas.im.transcript_density(  sdata, crd=None )
nas.pl.plot_image( sdata, img_layer='transcript_density', crd=[ 0, 20000, 0, 20000 ] )

One can also still provide a crd, and then sampling is not necessary:

sdata=nas.im.transcript_density(  sdata, crd=[ 0, 20000, 0, 20000 ] )
nas.pl.plot_image( sdata, img_layer='transcript_density', crd=[ 0, 20000, 0, 20000 ] )

saeyslab / napari-sparrow

fix the plot_control transcripts for large images #107