Closed MartaBenegas closed 4 years ago
In general, these methods are not suited to small datasets like this. UMAP and Louvain (community detection method) both rely on the construction of a neighbor graph. Louvain tries to find optimal partitions in the graph that basically maximize the within-partition neighbor connections and minimize the number of connections going outside the partition. This is a k-nearest neighbor graph, where k is typically a number ~20. You can imagine then that if you have 20 cells and k=20, then every cell is the nearest neighbor if every other cell and this method won't be able to find any meaningful partitions. Similar problem for UMAP: it builds a knn graph and tries to find a low-dimensional embedding that preserves the high-dimensional distances between cells.
Thanks for your answer, it is very clarifying. But I still find hard to understand what I explain on question 4 about the difference between UMAP and FindNeighbors functions.
Best, Marta.
Hi, first of all thanks for your great job! I'm new on this kind of analysis so I have a few doubts interpreting the results, I hope this doesn't bother you too much. I was testing a pipeline with a small SMART-seq2 dataset of 34 cells, which is a part of this atlas project. Here you have my code:
And I obtain the following UMAP: As you see, it is very disperse and I really had to force the resolution parameter on FindClusters so it could give me any clusters. In order to test if it was because of the dataset itself or because the size I've performed the same procedure with a subset of a larger dataset I was analyzing too. This is the UMAP for the entire dataset: I've chosen three cells of each cluster (39 cells in total) and re-made the analysis: And I've checked that the original structure is more or less conserved (taking into account that a lot of information is missed compared with the original dataset), as cells that cluster together in the original dataset are in general grouped on a same cluster in the subset analysis: And now a few questions arise:
And besides that, I've realized that in the clustering tutorial you perform first the clustering and then de UMAP, but in the integrating datasets tutorial you do it the other way around; first the UMAP and then the clustering:
Thank you in advance and I'm sorry for the amount of questions. If you can refer me to a paper or tutorial that answer my questions I'll be happy with that too. Marta.