Open Accompany0313 opened 2 months ago
Hum.. The only potential error would be if the cells you input is not the first x cells in the original dataset, the cell embedding would be offset a little bit. you can still hack into the system by replacing the cell embeddings .npy file with the embeddings of those subset of cells in the same order.
Hi Ruochi, Nice work!
When I use Higashi to impute the data with 10K resolution, the program always breaks due to memory limitations. So can I impute some of the cells in this data set using a model trained on a complete data set? For example, I train all 4238 cells from the Lee2019 dataset, and then I separately impute the 4238 cells in batches of 1000 cells at a time. But since I was reloading the 1000 cells as I impute each batch, I wondered if this had any impact on the results.
Here is the code I used to train my model:
from higashi.Higashi_wrapper import * import numpy as np config = "/home/zzl/ygc/Higashi/tan2021/10K/100K_128/config.JSON" higashi_model = Higashi(config) higashi_model.generate_chrom_start_end() higashi_model.extract_table() higashi_model.create_matrix() higashi_model.prep_model()
higashi_model.train_for_embeddings()
higashi_model.train_for_imputation_nbr_0()
higashi_model.train_for_imputation_with_nbr()
Here is my code where I impute 1000 of these cells:
from higashi.Higashi_wrapper import * import numpy as np config = "/home/zzl/ygc/Higashi/tan2021/10K/100K_128/config.JSON" higashi_model = Higashi(config) higashi_model.generate_chrom_start_end() higashi_model.extract_table() higashi_model.create_matrix() higashi_model.prep_model()
higashi_model.impute_no_nbr()
higashi_model.impute_with_nbr()