hansenlab / tricycle

23 stars 8 forks source link

estimate_cycle_position: Error in scale(t(as.matrix(data.m)) #8

Closed GBeattie closed 2 years ago

GBeattie commented 2 years ago

Hey,

I'm attempting to run estimate_cycle_position from a Seurat object, using the original object as input for a custom reference (I'm aware that this approach may not be ideal, and could be the source of the issue), however I'm getting an error despite converting the object to a SingleCellExperiment. Commands run and outputs/error below. Any assistance greatly appreciated!

> cc.ref <- run_pca_cc_genes(as.SingleCellExperiment(DietSeurat(seu.int)), exprs_values = "logcounts", species = "human", gname.type = "SYMBOL")
No AnnotationDb desginated. org.Hs.eg.db will be used to map Human ENSEMBL id to gene SYMBOL.
No gname input. Rownames of sce.o will be used.
'select()' returned 1:many mapping between keys and columns
1619 out of 2794 Gene Ontology cell cycle genes found in your data.
> cc <- estimate_cycle_position(as.SingleCellExperiment(DietSeurat(seu.int)), ref.m = cc.ref)
The designated dimred do not exist in the SingleCellExperiment or in altexp. project_cycle_space will be run to calculate embedding tricycleEmbedding
The number of projection genes found in the new data is 1619.
Error in scale(t(as.matrix(data.m)), center = TRUE, scale = FALSE) %*%  : 
  requires numeric/complex matrix/vector arguments
sjczheng commented 2 years ago

Hello,

The output of run_pca_cc_genes is not a ref matrix. You could use the code we provide in the manual of run_pca_cc_genes: gocc_sce.o <- run_pca_cc_genes(neurosphere_example); new.ref <- attr(reducedDim(gocc_sce.o, "PCA"), "rotation")[, seq_len(2)] In your case, you can run ref.o <- run_pca_cc_genes(as.SingleCellExperiment(DietSeurat(seu.int)), exprs_values = "logcounts", species = "human", gname.type = "SYMBOL"); cc.ref <- attr(reducedDim(ref.o, "PCA"), "rotation")[, seq_len(2)]; cc <- estimate_cycle_position(as.SingleCellExperiment(DietSeurat(seu.int)), ref.m = cc.ref)

Best, Shijie

GBeattie commented 2 years ago

Thanks for this, working now, my fault for missing the step!

GBeattie commented 2 years ago

Hey, thanks again, I'll not reopen this as it may be arising from using a self-reference as I've done, but I've run this for a couple of datasets (each has 3 timepoints), and each time one of the samples is a clear outlier. I've ran as you outlined above, is this real or is it due to missing an appropriate reference? Happy to make a new issue if the reason behind this is more complex. Two examples of the different datasets below:

image image

sjczheng commented 2 years ago

Hi.

I am not sure whether it is real or is it due to missing an appropriate reference. As you are using your own reference, you need to understand and validate your reference first. What kind of system are you using as a reference? Do you have multiple samples in your reference data? And do you observe batch effects between samples? Are there any other known or unknown factors driving the variation in your reference data? I am sorry, but we could not make any conclusion without a full understanding of the background. I advise against making conclusions just from these two figures.

Best, Shijie

kasperdanielhansen commented 2 years ago

With the caveat of not having fully studied the previous messages, I would advise against using your own reference. When we started to do the research that ended up in the paper, we were pretty convinced that we would eventually need a collection of different references. It was pretty surprising to us that we could do as well as we do, with a single "universal" reference. I would at least generate these plots, using the reference we provide.

On Fri, May 27, 2022 at 10:46 AM Shijie C. Zheng @.***> wrote:

Hi.

I am not sure whether it is real or is it due to missing an appropriate reference. As you are using your own reference, you need to understand and validate your reference first. What kind of system are you using as a reference? Do you have multiple samples in your reference data? And do you observe batch effects between samples? Are there any other known or unknown factors driving the variation in your reference data? I am sorry, but we could not make any conclusion without a full understanding of the background. I advise against making conclusions just from these two figures.

Best, Shijie

— Reply to this email directly, view it on GitHub https://github.com/hansenlab/tricycle/issues/8#issuecomment-1139688549, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABF2DH3UW7MGGLOOIJZWYN3VMDN3XANCNFSM5XEBUSAA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Best, Kasper

GBeattie commented 2 years ago

Thanks for the input both of you! @kasperdanielhansen yes, I've now had a read of a bit more of the paper and it does seem your provided reference works well with human (3 of the 4 datasets I'm working with are human lines, so it was an initial concern that the reference was mouse). I've re-ran using this code:

cc <- project_cycle_space(as.SingleCellExperiment(DietSeurat(seu.int)), gname.type = "SYMBOL", species = "human")
cc <- estimate_cycle_position(cc)

And the outputs are looking a bit more consistent for each of my 4 datasets, although some of the differences seem rather large. It's possible that this is what is expected, I'm a bioinformatician for a core, not part of the lab, so I can't comment on whether this reflects what they expect. Although the lab does say they see notable cell cycle changes by flow across timepoints, so I will see what they think.

image
kasperdanielhansen commented 2 years ago

Do you mind posting the embedding pictures, ie. the scatterplots where we kind of see a circle?

On Fri, May 27, 2022 at 11:34 AM Gordon Beattie @.***> wrote:

Thanks for the input both of you! @kasperdanielhansen https://github.com/kasperdanielhansen yes, I've now had a read of a bit more of the paper and it does seem your provided reference works well with human (3 of the 4 datasets I'm working with are human lines, so it was an initial concern that the reference was mouse). I've re-ran using this code:

cc <- project_cycle_space(as.SingleCellExperiment(DietSeurat(seu.int)), gname.type = "SYMBOL", species = "human") cc <- estimate_cycle_position(cc)

And the outputs are looking a bit more consistent for each of my 4 datasets, although some of the differences seem rather large. It's possible that this is what is expected, I'm a bioinformatician for a core, not part of the lab, so I can't comment on whether this reflects what they expect. Although the lab does say they see notable cell cycle changes by flow across timepoints, so I will see what they think.

[image: image] https://user-images.githubusercontent.com/11555832/170730384-62cc85b6-a55c-4ca4-a3aa-9b557883a99b.png

— Reply to this email directly, view it on GitHub https://github.com/hansenlab/tricycle/issues/8#issuecomment-1139728663, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABF2DH27XAYDZRVPG6HHX2TVMDTPPANCNFSM5XEBUSAA . You are receiving this because you were mentioned.Message ID: @.***>

-- Best, Kasper

GBeattie commented 2 years ago

Sure, are these the correct ones? (I've just done it for two of the samples, showing the density plots on the left). They do look a quite different from the example data.. double checked the normalisation (since I'm doing a Seurat conversion) using scuttle::logNormCounts, but same result.

image