question about the cluster_assignments.txt

jsxlei / SCALE

Single-cell ATAC-seq analysis via Latent feature Extraction

MIT License

97 stars 17 forks source link

question about the cluster_assignments.txt #24

Open Chiancc opened 2 years ago

Chiancc commented 2 years ago

Dear Lei, When I ran SCALE, I can't get the output document of "cluster_assignments.txt" . So I want to know how to get the clustering results "cluster_assignments.txt". Thanks.

jsxlei commented 2 years ago

Hi, the cluster assignments now are within the adata.h5ad adata file, which can be read by scanpy e.g. adata = scanpy.read('adata.h5ad'), and selected by adata.obs['leiden'].

Chiancc commented 1 year ago

Dear Lei, I'm sorry to bother you again. After I ran SCALE, automatically got umap.png & adata.h5ad & model.pt. But now I want to get tsne.txt that you mentioned in the tutorial. And I want to use it to plot tsne picture. So could u tell me how to get the tsne.txt. Best wishes.

jsxlei commented 1 year ago

You can get the tsne with two options, 1. run SCALE from scratch with --embed tSNE, then you can get a tsne.png and tsne embedding in adata.obsm. 2. run with import scanpy as sc, sc.tl.tsne(adata, use_rep='latent'), then you can get same result with option 1 without rerun SCALE. Option 2 is recommend if you can use scanpy.

Chiancc commented 1 year ago

Lei, I hava another question. My dataset has 2000 cells and 5 clusters. But I ran your model, and the result was 9 clusters. The command I used is "python SCALE.py -d "xxxx.csv" --n_centroids 5 --binary --embed tSNE". How to specify the number of clusters to reach the actual number of clusters in the dataset. For example, after ran the model, I want my dataset remain 5 clusters, not 9 clusters.

jsxlei commented 1 year ago

Currently SCALE has disabled the kmeans cluster and adopt Leiden clustering, which will cause cluster number not same as wanted. You can do the Kmeans with the latent of SCALE by yourself. Or pip install git+https://github.com/jsxlei/SCALE.git and run with "SCALE.py -d "xxxx.csv" --n_centroids 5 --binary --embed tSNE", then you will get 'kmeans' cluster in adata.obs.