ma-compbio / Higashi

single-cell Hi-C, scHi-C, Hi-C, 3D genome, nuclear organization, hypergraph
MIT License
78 stars 11 forks source link

The main_cell.py is so slow #39

Open yangfeizZZ opened 1 year ago

yangfeizZZ commented 1 year ago

Hello when i run main_cell.py ,it is very very very slow. I start run main_cell.py at Monday this week, but the result as flow until now.

[ Epoch 29 of 80 ]

So i want to know how to more faster. Thank you very much.

ruochiz commented 1 year ago

Hey, did you train the model on GPU device or CPU device, and what would be the CPU / GPU utilization.

yangfeizZZ commented 1 year ago

Hey, did you train the model on GPU device or CPU device, and what would be the CPU / GPU utilization. I used GPU,but it has error:

[ Epoch 38 of 60 ]

ruochiz commented 1 year ago

Could you try to run nvidia-smi -q -d Memory |grep -A4 GPU|grep Free and nvidia-smi -q -d Memory |grep -A4 GPU in you command line and see what it returns. Higashi uses a hacky way to figure out how many GPUs you had ,and that can be not compatible for some cuda version.

Also, what did you put in the 'gpu_num' parameter in the config.JSON file, and how many GPU cards do you have on that machine.

yangfeizZZ commented 1 year ago

Could you try to run nvidia-smi -q -d Memory |grep -A4 GPU|grep Free and nvidia-smi -q -d Memory |grep -A4 GPU in you command line and see what it returns. Higashi uses a hacky way to figure out how many GPUs you had ,and that can be not compatible for some cuda version.

Also, what did you put in the 'gpu_num' parameter in the config.JSON file, and how many GPU cards do you have on that machine.

I don't know what's mean of "no improvement early stopping"? Is it mean the trianing is ok so it stop

yangfeizZZ commented 1 year ago

Could you try to run nvidia-smi -q -d Memory |grep -A4 GPU|grep Free and nvidia-smi -q -d Memory |grep -A4 GPU in you command line and see what it returns. Higashi uses a hacky way to figure out how many GPUs you had ,and that can be not compatible for some cuda version.

Also, what did you put in the 'gpu_num' parameter in the config.JSON file, and how many GPU cards do you have on that machine.

I set "gpu_num": 2, but it has same error. So I don't know when it means the end of training and can be visualized