Closed sheetalgiri closed 3 years ago
Hi!
For epigenomic datasets, such as scATAC-seq and scMethyl-seq, we use cisTopic when preprocessing our data. You can try a range of dimensions and cisTopic has a funcionality to choose the one with the highest likelihood.
For PCA, we checked the percentage of variance explained by the chosen number of dimensions and made a decision based on that (e.g. >=75%).
For both, our process was as follows: data --> cisTopic/PCA --> unit normalization, so we applied unit normalization after dimensionality reduction for the datasets in our paper.
I hope this was helpful. Let me know if you have any questions :)
thanks, that makes it clear :)
As far as I understood preprocessing steps for snare-seq are atac-seq dataset-> cistopic -> unit normalization rna-seq dataset-> unit normalization -> PCA-10 components
Is that correct?
I know this is a general machine learning question, but what did you use to choose the number of components when doing PCA for a different dataset? Which tool/settings do you recommend?