ma-compbio / Fast-Higashi

single-cell Hi-C, scHi-C, Hi-C, 3D genome, nuclear organization, tensor decomposition
MIT License
17 stars 5 forks source link

E0FError: Ran out of input #5

Closed khsjh closed 1 year ago

khsjh commented 1 year ago

I'm trying to run tutorial with Fast-Higashi on Lee et al. dataset (sn-m3c-seq on PFC). I downloaded all of file in goole drive (https://drive.google.com/drive/folders/1SuzqQ_9dliAmTb-fGprFnN3aZrfWS-Fg?usp=sharing)

The code that I used is below

from fasthigashi.FastHigashi_Wrapper import *
config = "/work/magroup/ruochiz/fast_higashi_git/config_pfc.JSON"
model = FastHigashi(config_path=config,
                 path2input_cache="/work/magroup/ruochiz/fast_higashi_git/pfc_500k",
                 path2result_dir="/work/magroup/ruochiz/fast_higashi_git/pfc_500k",
                 off_diag=100,
                 filter=False,
                 do_conv=False,
                 do_rwr=False,
                 do_col=False,
                 no_col=False)
model.prep_dataset(batch_norm=True)

And the error was occurred

total number of cells that pass qc check 4145 bad 93 total: 4238
cache file = /das2/younso/Hic/schic/F_higashi/pfc_500k/cache_intra_500000_offdiag_100_.pkl
loading cached input from /das2/younso/Hic/schic/F_higashi/pfc_500k/cache_intra_500000_offdiag_100_.pkl
---------------------------------------------------------------------------
EOFError                                  Traceback (most recent call last)
Cell In[4], line 1
----> 1 model.prep_dataset(batch_norm=True)

File ~/bin/Fast-Higashi/fasthigashi/FastHigashi_Wrapper.py:484, in FastHigashi.prep_dataset(self, meta_only, batch_norm)
    481         cache_extra = ""
    482 path2input_cache_intra = os.path.join(self.path2input_cache, 'cache_intra_%d_offdiag_%d_%s.pkl' % (
    483         res, self.off_diag, cache_extra))
--> 484 all_matrix += self.preprocess_contact_map(
    485         self.config, reorder=reorder, path2input_cache=path2input_cache_intra,
    486         batch_norm=batch_norm,
    487         is_sym=True,
    488         off_diag=self.off_diag,
    489         fac_size=1,
    490         merge_fac_row=int(res / self.config['resolution']), merge_fac_col=int(res / self.config['resolution']),
    491         filename_pattern='%s_sparse_adj.npy',
    492         force_shift=False,
    493 )
    495 size_list = [m.shape[0] for m in all_matrix]
    496 num_cell = all_matrix[-1].shape[-1]

File ~/bin/Fast-Higashi/fasthigashi/FastHigashi_Wrapper.py:377, in FastHigashi.preprocess_contact_map(self, config, reorder, path2input_cache, batch_norm, key_fn, **kwargs)
    375 with open(path2input_cache, 'rb') as f:
    376         for chrom in self.chrom_list:
--> 377                 all_matrix.append(pickle.load(f))
    378 sys.stdout.flush()
    379 return all_matrix

EOFError: Ran out of input

In the tutorial description, below sentence is located "To run Fast-Higashi for a new dataset, please prepare the same input files for the Higashi software. Use the Higashi software higashi.process_data() to transform contact pair files to sparse contact maps."

Is this mean that I should use Higashi at first? But, then, I can't use FastHigashi(config_path=config, ... function because it will be substituted Higashi(config).

Why this error is occurred?

ruochiz commented 1 year ago

Oh, sorry for the confusion, but I changed API of Fast-Higashi over the time, and the data in that folder no longer runs. Main cause is that, before I store all chromosomes into one chunk in a pkl file, not they are chunked by chromosome to improve IO & memory usage. As for the PFC data, you can get it from the original m3c paper.

All the tutorial & notebook here stays up to date: https://github.com/ma-compbio/Higashi/wiki/Fast-Higashi-Gallery