The main_cell.py is so slow

ma-compbio / Higashi

single-cell Hi-C, scHi-C, Hi-C, 3D genome, nuclear organization, hypergraph

MIT License

78 stars 11 forks source link

The main_cell.py is so slow #39

Open yangfeizZZ opened 1 year ago

yangfeizZZ commented 1 year ago

Hello when i run main_cell.py ,it is very very very slow. I start run main_cell.py at Monday this week, but the result as flow until now.

[ Epoch 29 of 80 ]

(Training) BCE: 0.348 MSE: 0.719 Loss: 0.349 norm_ratio: 0.00: 32%|▎| 321/1000 [44:21<1:30:05, 7.96s/it]

So i want to know how to more faster. Thank you very much.

ruochiz commented 1 year ago

Hey, did you train the model on GPU device or CPU device, and what would be the CPU / GPU utilization.

yangfeizZZ commented 1 year ago

Hey, did you train the model on GPU device or CPU device, and what would be the CPU / GPU utilization. I used GPU,but it has error:

[ Epoch 38 of 60 ]

(Training) bce: 0.1953, mse: 0.0000, acc: 98.688 %, pearson: 0.943, spearman: 0.643, elapse: 152.854 s
(Validation-hyper) bce: 0.1811, acc: 99.596 %,pearson: 0.968, spearman: 0.646,elapse: 0.101 s
no improve 4 [ Epoch 39 of 60 ]
(Training) bce: 0.1946, mse: 0.0000, acc: 98.729 %, pearson: 0.944, spearman: 0.643, elapse: 148.983 s
(Validation-hyper) bce: 0.1793, acc: 99.619 %,pearson: 0.971, spearman: 0.648,elapse: 0.122 s
no improvement early stopping
(Validation-hyper) bce: 0.1806, acc: 99.606 %, auc: 0.966, aupr: 0.647,elapse: 0.564 s
Traceback (most recent call last): File "/home/yangfei/Higashi/higashi/main_cell.py", line 1472, in select_gpus[i]) TypeError: 'NoneType' object is not subscriptable

ruochiz commented 1 year ago

Could you try to run nvidia-smi -q -d Memory |grep -A4 GPU|grep Free and nvidia-smi -q -d Memory |grep -A4 GPU in you command line and see what it returns. Higashi uses a hacky way to figure out how many GPUs you had ,and that can be not compatible for some cuda version.

Also, what did you put in the 'gpu_num' parameter in the config.JSON file, and how many GPU cards do you have on that machine.

yangfeizZZ commented 1 year ago

Could you try to run nvidia-smi -q -d Memory |grep -A4 GPU|grep Free and nvidia-smi -q -d Memory |grep -A4 GPU in you command line and see what it returns. Higashi uses a hacky way to figure out how many GPUs you had ,and that can be not compatible for some cuda version.

Also, what did you put in the 'gpu_num' parameter in the config.JSON file, and how many GPU cards do you have on that machine.

I don't know what's mean of "no improvement early stopping"? Is it mean the trianing is ok so it stop

yangfeizZZ commented 1 year ago

Could you try to run nvidia-smi -q -d Memory |grep -A4 GPU|grep Free and nvidia-smi -q -d Memory |grep -A4 GPU in you command line and see what it returns. Higashi uses a hacky way to figure out how many GPUs you had ,and that can be not compatible for some cuda version.

Also, what did you put in the 'gpu_num' parameter in the config.JSON file, and how many GPU cards do you have on that machine.

I set "gpu_num": 2, but it has same error. So I don't know when it means the end of training and can be visualized