Closed JohnGenome closed 3 years ago
Hi,
Thanks for your interests in our method. I just updated the wiki page with some runtime information.
https://github.com/ma-compbio/Higashi/wiki/Higashi-Usage#runtime-of-higashi
A more detailed discussion on the runtime can be found in our published version of the manuscript. For your machine, I think the bottleneck could be the memory. The consumption of memory depends on the number of cells, sequencing depths of the dataset and the resolution. 32GB should be enough for most datasets at 1Mb resolution.
I'm new in deep learning and i have access to a node only with CPUs (maybe 20 cores and 256 GB). Is training only with CPU feasible? THX!!!
Sorry for asking too many questions in an issue. I don't understand the measuring of runtime from wiki page.
A scHiC dataset (e.g. Nagano et al. dataset) with 1,171 single cells and 56,800 median contacts per cell has about 7e7 observed positive triplets. When training with cd-GNN (k=4
, fast mode) as wiki page shows, it takes 7e7/(192*1000)epoch * 109.6s/epoch = 40000s = 11h
to go over the whole training dataset once. Does the formula make sense?
Yes, training with CPU is feasible.
The formula makes sense. But you do not need to go over the whole training dataset once. Our test showed that, about 45 ~ 60 epochs are enough for most of the dataset.
Hi @ruochiz , I would like to try Higashi using public data. And is there any information about the time consumption of model training? Does an entry level PC(31GB, i7-4770, GTX750Ti) meet the hardware requirements?