shuxiaoc / maxfuse

Other
41 stars 8 forks source link

Perfomance when integrating COSMX-1Kpanel and COSMX-6Kpanel datasets #12

Closed joaolsf closed 3 months ago

joaolsf commented 3 months ago

Hi, I have been testing the tool over the last weeks, as I am interesting in the integration of scRNAseq data with spatial datasets (proteomics and image-based transcriptomics, such as Cosmx). I could run the tutorials data fine as well as my own dataset, where I tested the integration of cosmx 1K-panel dataset with a cosmx 6K-panel (the tissue type is the same for both, pancreas). My aim is to evaluate whether I can transfer the expression values of the 5K genes from the 6K-panel (not present in the 1K-panel), into the 1K dataset. I tested different "wt" values (between 0.3-0.7), including labels or not in the model (parameters labels1 and labels2), etc. In the end, the umap based on the cca reduction shows good integration of both datasets, regardless of the parameters I've used when preparing the model, finding initial pivots and propagation. However, when I plot the expression levels, either in a UMAP (not based on CCA reduction) or in the cells spatial location, there are massive differences where genes, present in both panels (1K and 6K), are expressed. For example, when comparing the gene expression of the insulin gene (INS), from the original 1K dataset versus the INS expression in the 1K dataset, but resulted from the matched 6K dataset cells, there are huge differences. I wondered 2 reasons for this:

1- Could this be due to low performance of maxfuse in this task (integration 2 single-cell spatial transcriptomics datasets)?

2- Or is it possible I am using the wrong indexes? I say this because the full matching table uses integers for the cell indexes (0, 1, 2, 3...) in the columns mod_indx1, mod_indx2, but my original objects uses another style for the cell indexes. I am assuming that, for example, the cell number "0" in the full matching table, it is the first cell in my original anndata object. Is that correct? Would be possible to add the original cell IDs as in index to the expression array generated and this be used in the fusor model?

Thanks for your help and clarifying this.

Edited: I had fix this issue. It was related to the matching of the indexes. The integration actually worked very well as well as the transfer of gene expression from the 6K panel to the 1K panel.

BokaiZhu commented 3 months ago

Happy you figured it out! let us know if there are further questions, good luck maxfusing. I will be closing the issue but feel free to reopen it if needed.

Best, Bokai