Closed Young-Sook closed 2 years ago
Hi @Young-Sook, currently this is not implemented in VoxHunt unfortunately. I'm unsure in what cases using the scaled data would be preferred, could you maybe explain why using the log-normalized counts is problematic in your case?
Hi @joschif,
I want to use integrated values in this analysis to remove batch effects in the data, and I specifically want to use scale.data as it is centered Pearson residuals. I solved that problem by taking out the matrix I wanted from Seurat object and plugging that matrix into voxHunt.
Another thing I am curious about is the scale in the plot generated by plot_structure_similarity
command. I believe that command plots an heat map of correlations, but the scale bar has different scales ([-2 to 4], instead of [-1 to 1]). Is this z-score? Could you walk me through how you get the values that plot_structure_similarity
command plots? I can generate another issue if you prefer, as this is a separate question from my original question.
Thanks, Young-Sook
Hi @Young-Sook ,
in case of integration I would actually still recommend using unintegrated log-normalized expression values for correlation in VoxHunt and use integration rather for co-clustering and feature selection. This is because integration-corrected transcriptomes are altered to match one another, but this may introduce biases for quantitative analyses like correlation or DE analysis.
To your second question, yes, the correlation values are z-scaled by default, but you can change this behaviour through the scale
argument.
Hope this helps :)
Cheers, Jonas
Hi Jonas,
Thanks for quick response. Could you elaborate on 'integrated values may introduce biases for quantitative analyses like correlation or DE analysis'?
From my understanding, integration in Seurat removes batch effects in multiple datasets by putting all datasets into the same hyperplane. This could be a problem if I want to do DE analysis as that integration process also removes biological difference between datasets. For example, if one dataset is from patient1 and the other dataset is from patient2, the biological difference between those two patients will be removed in the integrated values.
However, this could be less of a problem if I just want to check what cell types I get from each cell cluster. For instance, if I want to check if my cell cluster #1 is radial glia or not using voxHunt, I think it's better to use integrated values as those values do not have any confounding batch effects and biological difference between datasets.
I could be totally wrong, and feel free to correct me! I just want to have better understanding to think about what might be the best practice to use voxHunt.
Thank you so much, Young-Sook
Hi @Young-Sook,
I definitely agree that it's less of a problem to use integrated values for VoxHunt than for DE. For integration the goal is to remove unwanted (technical) variation while retaining the biological information. With VoxHunt, we are quite explicitly assessing the biologically relevant variation by computing similarities to brain structures. So in my view, one would expect that integration does not change the annotations made by VoxHunt. Or to put it differently, if integration drastically changes the mapping then this would be indicative of an issue with either the integration or the data. Also, one can not usually inspect how integration changes the dataset, so apart from co-clustering and visualization, I would advise to be careful using it for really any quantitative analysis.
That said, I haven't really played with it much and have little experience with doing this kind of thing in practice, so please let me know how it works for you and if it improves your results :)
Cheers, Jonas
Hi,
I was very impressed by the VoxHunt paper and I am excited to try that for our organoid scRNA-seq data!
One thing I wonder is if I could use 'scale.data' slot from Seurat object to run
voxel_map
. I could see the code uses 'data' slot, but I know that's not ideal in our dataset and it would be ideal to use 'scale.data' slot. I wonder if you have any plan to make an argument for that so that a user can choose a specific data slot they want to use?Thanks, Young-Sook