Today, I tried training VOC2007 for object detection with faster-rcnn on my Dell server. The information of my server is:
uname -a
Linux sem-PowerEdge-T630 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
And the information of My GPU is:
04:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)
The dataset VOC2007 contains 9966 pictures, and each picture' s size is about 300*500.
When I trained the data on faster-rcnn framwork, the server reboot after 200~400 iters.
I record nvidia-smi every 0.01 seconds, and the last record before reboot was:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 970 Off | 0000:04:00.0 Off | N/A |
| 43% 53C P2 94W / 151W | 2042MiB / 4036MiB | 11% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 15911 C python 2040MiB |
+-----------------------------------------------------------------------------+
I tried several times, nothing changed.
Can anyone help?
Is there any problem with the huge data and small gpu memory? But the last time gpu memory is 2042MB/4036MB. I am confused.
Today, I tried training VOC2007 for object detection with faster-rcnn on my Dell server. The information of my server is: uname -a Linux sem-PowerEdge-T630 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux And the information of My GPU is: 04:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1) The dataset VOC2007 contains 9966 pictures, and each picture' s size is about 300*500. When I trained the data on faster-rcnn framwork, the server reboot after 200~400 iters. I record nvidia-smi every 0.01 seconds, and the last record before reboot was: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.48 Driver Version: 367.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 970 Off | 0000:04:00.0 Off | N/A | | 43% 53C P2 94W / 151W | 2042MiB / 4036MiB | 11% Default | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 15911 C python 2040MiB | +-----------------------------------------------------------------------------+
I tried several times, nothing changed. Can anyone help? Is there any problem with the huge data and small gpu memory? But the last time gpu memory is 2042MB/4036MB. I am confused.