ma-compbio / Higashi

single-cell Hi-C, scHi-C, Hi-C, 3D genome, nuclear organization, hypergraph
MIT License
76 stars 10 forks source link

resuming interrupted training and imputation using Higashi+FastHigashi protocol #56

Open chooliu opened 1 week ago

chooliu commented 1 week ago

Hi Ruochi, thanks so much for developing Higashi & FastHigashi.

I've been trying to obtain the cell-level imputed with neighbor matrices following the newer Fast-Higashi tutorials/Ramani et al.ipynb workflow on a larger dataset in which it's difficult to request enough compute time on our cluster to complete the training and imputation in one go.

I notice in the Higashi API notes that the temp_dir should store intermediate outputs in case of interruption, but have not been able to get this to resume. Wanted to ask if there's suggested commands to make sure these load properly to the right object structure / if there's commands to skip especially in the Higashi+FastHigashi case--or if the intermediate results should automatically load given the same config file.

Namely, I can usually get through higashi_model.prep_model(), and fh_model.run_model() but after this is there a proper way to load .higashi_model in a new session?

>>> higashi_model.impute_with_nbr()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lib/python3.10/site-packages/higashi-0.1.0a0-py3.10.egg/higashi/Higashi_wrapper.py", line 1565, in impute_with_nbr
    del self.higashi_model
AttributeError: higashi_model

Cheers!

ruochiz commented 1 week ago

Hi, thank you for you interesting. That's a good question. I probably overlooked to implement a loading function. At this moment I think the most straightforward way to do is just to use pickle to dump and load trained Higashi_model instances.

chooliu commented 1 week ago

Thanks for the thoughts Ruochi! I got the chance to try this today, but think my naive attempt at pickling results in an error.

The following code runs without any clear errors and will begin the imputation via higashi_model.train_for_imputation_nbr_0() if I run it all in one session. (Will just time out on that step due to my computing cluster constraints)

from higashi.Higashi_wrapper import *
from fasthigashi.FastHigashi_Wrapper import *

config = "higashi_config/config.json"
higashi_model = Higashi(config)

higashi_model.process_data()

fh_model = FastHigashi(config_path = config,
                       path2input_cache = "higashi_cache",
                       path2result_dir = "higashi_output",
                       off_diag = 100, filter = False, do_conv = False,
                       do_rwr = False, do_col = False, no_col = False) 

fh_model.prep_dataset()
fh_model.run_model(dim1 = 0.6, rank = 256, n_iter_parafac = 1,extra = "")

higashi_model.prep_model()
higashi_model.train_for_embeddings()

# added pickle dump section --------------------------------------
with open("higashi_output/fh_model.pickle", "wb") as f:
    pickle.dump(fh_model, f)
with open("higashi_output/higashi_model.pickle", "wb") as f:
    pickle.dump(higashi_model, f)
# ------------------------------------------------------------

higashi_model.train_for_imputation_nbr_0() 
higashi_model.impute_no_nbr()

higashi_model.train_for_imputation_with_nbr()
higashi_model.impute_with_nbr()

However, splitting it into two jobs and trying to pickle load higashi_model seems to result in the following error.

higashi_model = pickle.load( open("higashi_output/higashi_model.pickle", "rb") )
higashi_model.train_for_imputation_nbr_0() 

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "higashi-0.1.0a0-py3.10.egg/higashi/Higashi_wrapper.py", line 1374, in train_for_imputation_nbr_0
    self.train_for_imputation_no_nbr()
  File "higashi-0.1.0a0-py3.10.egg/higashi/Higashi_wrapper.py", line 1379, in train_for_imputation_no_nbr
    del self.higashi_model, self.node_embedding_init
AttributeError: higashi_model

Please let me know if I'm missing something obvious! Will try to make a reproducible example on smaller (low # cell) dataset in the meantime.