liuheng92 / tensorflow_PSENet

This is a tensorflow re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network.My blog:
https://blog.csdn.net/liuxiaoheng1992/article/details/87646951
MIT License
489 stars 162 forks source link

单GPU训练PSENet,CPU占用将近100%,GPU显存占用不到2G,训练4天了一直无任何进展,也不报错 #56

Closed SimonWang00 closed 4 years ago

SimonWang00 commented 4 years ago

训练指令: nohup python3.6 train.py --gpu_list=0 --input_size=512 --batch_size_per_gpu=16 --checkpoint_path=./model/ --training_data_path=./data/ &

训练完整打印日志如下:

(base) root@147:/home/test/deeplearn/tensorflow_PSENet# cat nohup.out DEBUG:matplotlib:CACHEDIR=/root/.cache/matplotlib DEBUG:matplotlib.font_manager:Using fontManager instance from /root/.cache/matplotlib/fontlist-v300.json DEBUG:matplotlib.pyplot:Loaded backend qt5agg version unknown. Unable to init server: Could not connect: Connection refused Unable to init server: 无法连接: Connection refused

(train.py:2842): Gdk-CRITICAL **: 16:03:54.581: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed

(train.py:2842): Gdk-CRITICAL **: 16:03:54.582: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed DEBUG:matplotlib.pyplot:Loaded backend agg version unknown. DEBUG:matplotlib.pyplot:Loaded backend agg version unknown. 2019-11-04 16:04:01.858346: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-11-04 16:04:01.935106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-11-04 16:04:01.935415: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.835 pciBusID: 0000:01:00.0 totalMemory: 7.92GiB freeMemory: 7.73GiB 2019-11-04 16:04:01.935427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2019-11-04 16:04:02.561361: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-11-04 16:04:02.561396: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2019-11-04 16:04:02.561402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2019-11-04 16:04:02.561543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7456 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1) resnet_v1_50/block1 (?, ?, ?, 256) resnet_v1_50/block2 (?, ?, ?, 512) resnet_v1_50/block3 (?, ?, ?, 1024) resnet_v1_50/block4 (?, ?, ?, 2048) INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/ INFO:root:1000 training images in ./data/

liuheng92 commented 4 years ago

https://github.com/liuheng92/tensorflow_PSENet/issues/14

SimonWang00 commented 4 years ago

问题已解决,感谢!