scverse / scvi-tools

Deep probabilistic analysis of single-cell and spatial omics data
http://scvi-tools.org/
BSD 3-Clause "New" or "Revised" License
1.24k stars 350 forks source link

[CLOSED] clustering #19

Closed jeff-regier closed 6 years ago

jeff-regier commented 6 years ago

Issue by jeff-regier Wednesday Apr 04, 2018 at 03:57 GMT Originally opened as https://github.com/YosefLab/scVI-dev/issues/14


To start, maybe figure out when VaDE does/doesn't work.

https://arxiv.org/pdf/1611.05148.pdf

jeff-regier commented 6 years ago

Comment by jeff-regier Wednesday Apr 18, 2018 at 18:22 GMT


@maxime1310 How about storing your work on this on a branch, e.g. max/vade, and then closing the issue?

jeff-regier commented 6 years ago

Comment by maxime1310 Wednesday Apr 18, 2018 at 19:38 GMT


@jeff-regier I'll just clean the code and make sure it's able to find the right clusters with the Retina dataset (for now it runs fine on Cortex), and then I'll do this so we can close the issue!

jeff-regier commented 6 years ago

Comment by maxime1310 Friday Apr 20, 2018 at 00:03 GMT


@jeff-regier the code I just pushed should: -reproduce Romain's visual clustering on the Retina dataset -show if using a VADE with those pre-trained weights yields improvement in the clustering (haven't included clustering metrics yet, the appreciation is simply visual for the moment) I launched it for the same number of epochs as Romain did (it takes a bit of time as the dataset is huge), once I have satisfying visual clustering results I'll push it and close the issue.

jeff-regier commented 6 years ago

Comment by jeff-regier Friday Apr 20, 2018 at 00:25 GMT


Sounds good, thanks Maxime.

jeff-regier commented 6 years ago

Comment by maxime1310 Tuesday Apr 24, 2018 at 17:50 GMT


The VADE doesn't improves much the clustering metrics used in the paper (i.e ARI, NMI, silhouette score) on the Retina dataset, but on the other hand (interesting fact considering the desperate need for pretraining on MNIST) seems to also work without much pretraining.

After pretraining: after pretraining

After VADE: after vade

jeff-regier commented 6 years ago

Comment by jeff-regier Tuesday Apr 24, 2018 at 18:03 GMT


Very interesting. Even though there's no obvious improvement for VADE, it could just be that this dataset is too easy to cluster (and that the few mistakes we see are essentially impossible to correct). Might be good to revisit VADE if we need to improve visualization in the future.

jeff-regier commented 6 years ago

Comment by maxime1310 Tuesday Apr 24, 2018 at 18:08 GMT


I definitely agree with you. Perhaps I'll take a bit of time to run it on datasets where scVI performs less well for clustering to see if there is improvement.