ma-compbio / Higashi

single-cell Hi-C, scHi-C, Hi-C, 3D genome, nuclear organization, hypergraph
MIT License
79 stars 10 forks source link

impute some of these cells with the trained model #57

Open Accompany0313 opened 2 months ago

Accompany0313 commented 2 months ago

Hi Ruochi, Nice work!

When I use Higashi to impute the data with 10K resolution, the program always breaks due to memory limitations. So can I impute some of the cells in this data set using a model trained on a complete data set? For example, I train all 4238 cells from the Lee2019 dataset, and then I separately impute the 4238 cells in batches of 1000 cells at a time. But since I was reloading the 1000 cells as I impute each batch, I wondered if this had any impact on the results.

Here is the code I used to train my model:

from higashi.Higashi_wrapper import * import numpy as np config = "/home/zzl/ygc/Higashi/tan2021/10K/100K_128/config.JSON" higashi_model = Higashi(config) higashi_model.generate_chrom_start_end() higashi_model.extract_table() higashi_model.create_matrix() higashi_model.prep_model()

higashi_model.train_for_embeddings()

higashi_model.train_for_imputation_nbr_0()

higashi_model.train_for_imputation_with_nbr()

Here is my code where I impute 1000 of these cells:

from higashi.Higashi_wrapper import * import numpy as np config = "/home/zzl/ygc/Higashi/tan2021/10K/100K_128/config.JSON" higashi_model = Higashi(config) higashi_model.generate_chrom_start_end() higashi_model.extract_table() higashi_model.create_matrix() higashi_model.prep_model()

higashi_model.impute_no_nbr()

higashi_model.impute_with_nbr()

ruochiz commented 2 months ago

Hum.. The only potential error would be if the cells you input is not the first x cells in the original dataset, the cell embedding would be offset a little bit. you can still hack into the system by replacing the cell embeddings .npy file with the embeddings of those subset of cells in the same order.