welch-lab / liger

R package for integrating and analyzing multiple single-cell datasets
GNU General Public License v3.0
381 stars 78 forks source link

Questions about online learning #216

Closed Ping-lin14 closed 3 years ago

Ping-lin14 commented 3 years ago

Hi I have a few questions when imitating the Iterative single-cell multi-omic integration using online learning tutorial to test.

1.When I run the createLiger function, I will receive an error message

pbmcs = createLiger(list(rna1 = "C:/Users/ping/Desktop/ATAC/liger_test/RNA/2018111910k_pbmc/pbmc_10k_v3_raw_feature_bc_matrix.h5",
                         rna2 = "C:/Users/ping/Desktop/ATAC/liger_test/RNA/pbmc_5k_v1/connect_5k_pbmc_NGSC3_ch1_raw_feature_bc_matrix.h5"))

Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

, sometimes it will cause R to crash. Is there something wrong with my input?

2.As mentioned in the tutorial, you need to enter the raw hdf5 file to create the Liger object. Can I merge files from different platforms? For example, scRNA+scATAC. Are the codes the same?

cgao90 commented 3 years ago

For the first issue, it is because at this moment online liger takes two HDF5 inputs that have the same set of genes/features. It is likely the two datasets used here have different sets of genes. In addition, the filtered_feature_bc_matrix are recommended, instead of the raw ones.

As for the second question: yes, online liger is able to align datasets from different platforms. As long as the slots (in HDF5 files) that store the data are correctly specified, the code should work.

Hope it helps. Thank you for your interest in trying out the online liger. Let us know if you run into any other issues.

Ping-lin14 commented 3 years ago

Thank you very much for your answers. The problem with thecreateLiger function has been solved.

However, I encountered the second problem. When I use ATAC and RNA for online liger, I receive a warning message at the selectGenes step

Warning in selectGenes(pbmcs, var.thresh = 0.2, do.plot = F) :
  No genes were selected; lower var.thresh values or choose 'union' for combine parameter

This caused scaleNotCenter to fail. Do you have any suggestions? This is my code:

pbmcs = createLiger(list(ATAC = "C:/Users/ping/Desktop/ATAC/liger_test/ATAC/atac_pbmc_10k_nextgem_filtered_peak_bc_matrix.h5",
                         RNA = "C:/Users/ping/Desktop/ATAC/liger_test/RNA/5k_pbmc_v3_nextgem_filtered_feature_bc_matrix.h5"))

pbmcs = normalize(pbmcs)
pbmcs = selectGenes(pbmcs, var.thresh = 0.2, do.plot = F)
stim
  |==========================================================================================================================| 100%
ctrl
  |==========================================================================================================================| 100%
Warning in selectGenes(pbmcs, var.thresh = 0.2, do.plot = F) :
  No genes were selected; lower var.thresh values or choose 'union' for combine parameter
cgao90 commented 3 years ago

Hi @Ping-lin14,

Could you try lower the var.thresh and see if there are more variable genes getting selected? This parameter has to be manually tuned every time when different datasets are used.

jw156605 commented 3 years ago

Are you trying to select variable genes from an snATAC dataset? This will not work well. Use the datasets.use parameter to select genes from only the RNA dataset(s).

Ping-lin14 commented 3 years ago

Hi @cgao90

I tried to reduce var.thresh to 0.01, var.genes is still 0.

Ping-lin14 commented 3 years ago

Hi @jw156605

Thank you for your comments. I set datasets.use to 2, but it doesn't seem to work. Or is my setting wrong?

cgao90 commented 3 years ago

I just noticed that you were using the ATAC peak by cell matrix. However, a gene-by-cell matrix is required as input. Please check out our published LIGER protocol for more details about scATAC-seq data processing as well as scRNA-seq and scATAC-seq data integration (link: https://www.nature.com/articles/s41596-020-0391-8)

erzakiev commented 11 months ago

Dumb question but I don't want to open an issue for that but I want to clarify, the meaning of term "online" in "online_iNMF". Is it online as in "connected to the internet and interacting with some online database online" or as in "learning features of the locally stored data in real time, in small batches"?