Suggestion to add example choosing k to the vignette

lcolladotor commented 2 years ago

Hi,

This is a great package! As you mention on the vignette, computing dist() can take (a) take a long time and (b) lots of memory. Abby @abspangler13 and I were considering using https://github.com/alexeckert/parallelDist/blob/master/R/parDist.R#L23 to resolve (a) but we would still be limited by (b). In particular if we computed a distance matrix across 100k spots in a Visium dataset (approx 75 GB of RAM: 1e5 * 1e5 * 8 / 1024^3 = 74.50581).

That led us to your work and well, something we noticed was missing from the vignette is an example where you choose a given k like they do in the following image from https://medium.com/codesmart/r-series-k-means-clustering-silhouette-794774b46586.

We thought that adding such an example might be useful for users like us.

Best, Leo

stephaniehicks commented 2 years ago

Hi @lcolladotor @abspangler13, thanks for your interested in H+! This is a great suggestion, I'll work towards that. In the mean time, we created a similar plot in Figure 5 of our paper (https://doi.org/10.1101/2022.02.03.479015).

The code for it is here: https://github.com/stephaniehicks/fasthpluspaper/blob/96c89b7e304d7846130fefb0068cd777bfc3e2ae/scripts/05_supp02-application_plots.R#L59

lcolladotor commented 2 years ago

Awesome, thanks!

ntdyjack / fasthplus

Suggestion to add example choosing k to the vignette #3