Open yangfeizZZ opened 1 year ago
Hey, did you train the model on GPU device or CPU device, and what would be the CPU / GPU utilization.
Hey, did you train the model on GPU device or CPU device, and what would be the CPU / GPU utilization. I used GPU,but it has error:
[ Epoch 38 of 60 ]
Could you try to run nvidia-smi -q -d Memory |grep -A4 GPU|grep Free
and nvidia-smi -q -d Memory |grep -A4 GPU
in you command line and see what it returns. Higashi uses a hacky way to figure out how many GPUs you had ,and that can be not compatible for some cuda version.
Also, what did you put in the 'gpu_num' parameter in the config.JSON file, and how many GPU cards do you have on that machine.
Could you try to run
nvidia-smi -q -d Memory |grep -A4 GPU|grep Free
andnvidia-smi -q -d Memory |grep -A4 GPU
in you command line and see what it returns. Higashi uses a hacky way to figure out how many GPUs you had ,and that can be not compatible for some cuda version.Also, what did you put in the 'gpu_num' parameter in the config.JSON file, and how many GPU cards do you have on that machine.
I don't know what's mean of "no improvement early stopping"? Is it mean the trianing is ok so it stop
Could you try to run
nvidia-smi -q -d Memory |grep -A4 GPU|grep Free
andnvidia-smi -q -d Memory |grep -A4 GPU
in you command line and see what it returns. Higashi uses a hacky way to figure out how many GPUs you had ,and that can be not compatible for some cuda version.Also, what did you put in the 'gpu_num' parameter in the config.JSON file, and how many GPU cards do you have on that machine.
I set "gpu_num": 2, but it has same error. So I don't know when it means the end of training and can be visualized
Hello when i run main_cell.py ,it is very very very slow. I start run main_cell.py at Monday this week, but the result as flow until now.
[ Epoch 29 of 80 ]
So i want to know how to more faster. Thank you very much.