Integrated datasets? - Githubissues

snehamitra / SCARlink

32 stars 6 forks source link

Integrated datasets? #11

Open danieljrichard opened 1 week ago

danieljrichard commented 1 week ago

Hello, I am very interested to use this tool in my work! I wondered if we had four different biological samples for 10X Multiome (which I've integrated using scRNA and Seurat's integration pipeline), whether SCARlink could handle this?

Or if instead the authors recommend running SCARlink on individual (un-integrated) samples. Any insights would be greatly appreciated!

snehamitra commented 1 week ago

We tried both of the approaches on a data set with multiple samples and found the predictions to be similar. There were a few genes that were not included in some samples due to sparsity issues. The threshold for calling gene-linked tiles might also need to be different.

danieljrichard commented 1 week ago

Thanks @snehamitra! If I might ask one additional question - is there any learning when running SCARlink on a Multiome dataset? I know in the paper its shown that SCARlink predicts gene activity using multiome ATAC more accurately than existing methods. Basically, if I wanted these more accurate gene-activity predictions for a stand-alone scATAC dataset (from the same tissue), can I leverage my Multiome data to improve these predictions?

snehamitra commented 1 week ago

That's an interesting idea. We haven't tried it. You could train the model on the multi-ome and then use the trained model to impute gene expression on your standalone scATAC-seq data. If the data is from matched tissue, then the predictions might be comparable. You could compare the prediction trends grouped by cell type. For example, is the imputed expression of a certain gene higher in cell type A compared to cell type B in both multi-ome and standalone scATAC-seq.