txWang / BERMUDA

BERMUDA (Batch Effect ReMoval Using Deep Autoencoders) is a novel transfer-learning-based method for batch correction in scRNA-seq data.
MIT License
27 stars 7 forks source link

help in running BERMUDA #1

Open nmalwinka opened 5 years ago

nmalwinka commented 5 years ago

Hi, would you be able to add example script how to connect pre-processing in R and follow up with autoencoder in Python please?

txWang commented 5 years ago

Hi,

We used two packages in R and saved the results as .csv file in order to run BERMUDA. You could follow the preprocessing steps in BERMUDA/R/pre_processing.R First, we used Seurat to find highly variable genes and cluster cells for each batch (e.g. BERMUDA/pancreas/muraro_seurat.csv). Then, we used MetaNeighbor to generate a similarity matrix between clusters of different batches (e.g. BERMUDA/pancreas/pancreas_metaneighbor.csv). Once you have the required .csv files, you could run BERMUDA directly (e.g. BERMUDA/main_pancreas.py). Hope this is helpful.

Best, Tongxin

nmalwinka commented 5 years ago

hi again, my dataset is quite big and I run out of memory, getting error: Error: cannot allocate vector of size 656.7 Gb Execution halted the metaneighbor package from Maggie Crow has some updated code to avoid vectorising (https://github.com/gillislab/MetaNeighbor/blob/master/R/MetaNeighborUS.R see MetaNeighborUSLowMem). Have you tried to upgrade your code to allow bigger datasets to run using Bermuda?

nmalwinka commented 5 years ago

I managed to figure it out by myself. I have a problem with result though. After loading code_list and producing code I expected it to be the same array size as data but it isn't:

>>> code.shape
(51687, 20)
>>> data.shape
(51687, 2583)

There are the same number of cells, but I have only 20 genes(?) there instead of 2583 variable genes.

Further question is how to transform this back to Seurat object? Many thanks

txWang commented 5 years ago

Hi,

Thank you for your question. Similar to many batch correction methods, BERMUDA removes batch effects by projecting the original data to a low dimensional space (dimensionality equals to 20 here). The low dimensional code does not suffer from batch effects and can be used for further analysis such as visualization. Currently, we do not support the transformation between our results and Seurat objects.

Best, Tongxin

yzcv commented 5 years ago

Hi,

I am quite attracted by your BERMUDA work, but I have a problem in running the "pre_processing.R" in the BERMUDA/R folder. I am wondering if you could provide the two datasets, namely "muraro_human.csv" and "baron_human.csv", which are required in the code "pre_processing.R". Thank you in advance.