zakieh-tayyebi / CellSpace

Scalable sequence-informed embedding of single-cell ATAC-seq data with CellSpace
MIT License
28 stars 5 forks source link

Cell types not separated in CellSpace #2

Closed WhaleGe closed 1 year ago

WhaleGe commented 2 years ago

Hello, Thank you very much for the tool you developed. But I have some problems, why can't my cell types be separated. The sample have 7879 cells

image image

My code is there:

CellSpace -output ./CellSpace_out/P11_T_0_embedding.tsv -cpMat ./CellSpace_out/P11_T_0_cell_by_tile_bin.mtx -peaks ./CellSpace_out/P11_T_0_top50K_variable_tiles.fa -sampleLen 150 -ngrams 3 -exmpPerPeak 20 -epoch 20
zakieh-tayyebi commented 2 years ago

Hello, Thank you very much for the tool you developed. But I have some problems, why can't my cell types be separated.

Hi!

Is it possible that the order of the cells is incorrect? It seems like the cell types are randomly scattered which is unlikely. In my experience, even if CellSpace isn't working well, it would recover some structure.

I've explained here that the cells may have a different order in the output matrix, and the R function that creates the CellSpace object should fix that. Basically, the label index (for example labelC1) has the same index as in the input count matrix.

WhaleGe commented 2 years ago

Hi!

Is it possible that the order of the cells is incorrect? It seems like the cell types are randomly scattered which is unlikely. In my experience, even if CellSpace isn't working well, it would recover some structure.

I've explained here that the cells may have a different order in the output matrix, and the R function that creates the CellSpace object should fix that. Basically, the label index (for example labelC1) has the same index as in the input count matrix.

I'm so sorry to bother you again. I followed your preprocessing steps exactly from the bam file, but in ArchR's umap graph the cell types are separable. And it seems that when constructing the CellSpace object, the label can be automatically corresponding (for example: labelC1 corresponds to the first row name of cellColData in ArchR). I checked the label correspondence several times, but still did not solve the problem.

cso <- CellSpace(
  project = id,
  emb.file = paste0(indir,"CellSpace_out/",id,"_embedding.tsv"), # cell and k-mer embeddings
  meta.data = cell.md[, c("Celltype", "Sample", "ArchR.cluster")],
  cell.names = rownames(cell.md)
)

image

zakieh-tayyebi commented 1 year ago

Hi! Is it possible that the order of the cells is incorrect? It seems like the cell types are randomly scattered which is unlikely. In my experience, even if CellSpace isn't working well, it would recover some structure. I've explained here that the cells may have a different order in the output matrix, and the R function that creates the CellSpace object should fix that. Basically, the label index (for example labelC1) has the same index as in the input count matrix.

I'm so sorry to bother you again. I followed your preprocessing steps exactly from the bam file, but in ArchR's umap graph the cell types are separable. And it seems that when constructing the CellSpace object, the label can be automatically corresponding (for example: labelC1 corresponds to the first row name of cellColData in ArchR). I checked the label correspondence several times, but still did not solve the problem.

cso <- CellSpace(
  project = id,
  emb.file = paste0(indir,"CellSpace_out/",id,"_embedding.tsv"), # cell and k-mer embeddings
  meta.data = cell.md[, c("Celltype", "Sample", "ArchR.cluster")],
  cell.names = rownames(cell.md)
)

image

Hi! I know this is a very old issue, and the GitHub repo was not being maintained for a while, but I was wondering if there are any updates. Did you every get a better result or have any idea what the issue was?