hansenlab / tricycle

24 stars 8 forks source link

Different results for same cluster #20

Open ayyildizd opened 4 months ago

ayyildizd commented 4 months ago

Hi, thanks for developing this nice tool.

I have two different subset of data which has the same stem-cell-like cluster as a common population. When I run tricycle with these 2 subset of data, I see that the same stem cell population gets different tricyclePosition assignment. In one case they are in mitotic phase and the other case they are in G1/G0 phase. These 2 subsets of data have big difference in terms of number of cells, the one gives me mitotic score has 10K less cells than the other and it is less heterogeneous subset. I am wondering if this is normal behaviour since their pca is different. Could you elaborate on which score to trust on in these cases ? For info: both of the datasets I used did not yield ellipsoid PCA.

kasperdanielhansen commented 4 months ago

Do you mind posting the two embedding plots (the ones I think you call PCA)?

On Tue, May 28, 2024 at 8:30 AM ayyildizd @.***> wrote:

Hi, thanks for developing this nice tool.

I have two different subset of data which has the same stem-cell-like cluster as a common population. When I run tricycle with these 2 subset of data, I see that the same stem cell population gets different tricyclePosition assignment. In one case they are in mitotic phase and the other case they are in G1/G0 phase. These 2 subsets of data have big difference in terms of number of cells, the one gives me mitotic score has 10K less cells than the other and it is less heterogeneous subset. I am wondering if this is normal behaviour since their pca is different. Could you elaborate on which score to trust on in these cases ? For info: both of the datasets I used did not yield ellipsoid PCA.

— Reply to this email directly, view it on GitHub https://github.com/hansenlab/tricycle/issues/20, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABF2DH7SH3X2RZOQDIDLUHLZER2FTAVCNFSM6AAAAABIM2T2AKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDAOJUGUYDMMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Best, Kasper

ayyildizd commented 4 months ago

Thanks for fast response.

First dataset

image image

Then the second dataset:

image image

As you can see common cluster nNSC and two others (NB1 and 2) gets different cell cycle phase assignment.

ayyildizd commented 4 months ago

I was re-looking the last dataset and I see TOP2A plot looks really weird

image

Then I re-run the tricycle with exact same code (calling from history) and now I see it is completely opposite of what I saw before.

image image image

Do you know why this happens?

kasperdanielhansen commented 4 months ago

I have a bit of a hard time following the timeline, so I'll react to your email starting with "thanks for fast response": These two cell cycle embeddings look weird, but in different ways. 1) In the embedding with 13k cells I don't see any hint of a hole in the circle. This can happen with extremely little sequencing depth, but - depending on the technology you're using - this may be weird. I might also be cheated by overplotting (ie. there are few cells in the middle but it is hard to see). 2) in the embedding with 2.9k cells I am missing the upper part of the "circle", ie. the yellow cells are not connected to anything.

Now for some facts about the process.

  1. There is nothing random about this. You should get the same result every time.
  2. This is in principle - with one important exception I will get to - a single cell prediction algorithm. Ie. your predicted time is not affected by other cells. The exception is that we start by centering the gene expression matrix where we subtract the mean of each gene. This centering is affected by which genes you feed the function. I would take each sample and process all cells in that sample, ie. not subsample the cells depending on cell population or other factors apart from QC. I would also not necessarily do any attempt at data integration, just feed the essentially raw data.

I am wondering if - at least for the 13k cells - you have selected a specific cell population and then run tricycle on that population?

Some more detail on the tech and the experiment might be helpful.

On Tue, May 28, 2024 at 9:18 AM ayyildizd @.***> wrote:

I was re-looking the last dataset and I see TOP2A plot looks really weird image.png (view on web) https://github.com/hansenlab/tricycle/assets/120032067/4979a968-938f-4b6e-a2f8-60b6710e5244

Then I re-run the tricycle with exact same code (calling from history) and now I see it is completely opposite of what I saw before. image.png (view on web) https://github.com/hansenlab/tricycle/assets/120032067/9f213c3b-b9da-425a-805a-9f5d874e8195 image.png (view on web) https://github.com/hansenlab/tricycle/assets/120032067/2cb0a751-f2f0-46f0-a799-8a9aae5f9c4f image.png (view on web) https://github.com/hansenlab/tricycle/assets/120032067/4f4a4706-5873-4b72-b294-59dab41642b3

Do you know why this happens?

— Reply to this email directly, view it on GitHub https://github.com/hansenlab/tricycle/issues/20#issuecomment-2135197526, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABF2DH2TSP5JQPRD44LBCZTZER7Y5AVCNFSM6AAAAABIM2T2AKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVGE4TONJSGY . You are receiving this because you commented.Message ID: @.***>

-- Best, Kasper

ayyildizd commented 4 months ago

They are all single nuclei sequencing coming from 10x platform. Our average sequencing depth is 50K. There are 24 samples here coming from both healthy and disease (3 different stages). There are batch effects in these samples, so I used harmony to integrate them. And yes both the datasets I run tricycle are subsets of a certain cell type populations. Do I really need to run tricycle on each sample one by one? Then what I don't get is harmony umaps here are used for visualisation purposes just like regular UMAPs and I thought they don't interfere with the tricycle since it uses PCA embedding. After I run the tricycle and get the timing I just transferred it to my seurat object to plot and understand which cell type has which cell cycle phase. Please let me know if I did/interpret this in wrong way.

Here is the code I used (and I used tricycle version 1.12.0)

ref.o <- run_pca_cc_genes(as.SingleCellExperiment(DietSeurat(seurat_obj)), exprs_values = "logcounts", species = "human", gname.type = "SYMBOL") 
cc.ref <- attr(reducedDim(ref.o, "PCA"), "rotation")[, seq_len(2)]
sce <- estimate_cycle_position(as.SingleCellExperiment(DietSeurat(seurat_obj)), ref.m = cc.ref) 

Then I transferred this column to seurat object to plot it in the reduction I want.

ayyildizd commented 4 months ago

Also to add, it is not giving same result when I rerun the same code. Please see below. I re-run the same code for the smaller dataset, and now you see the clusters on the right side of umap gets different color/assignment.

image

image