ilovin / lstm_ctc_ocr

Use CTC + tensorflow to OCR
https://ilovin.github.io/2017-04-06/tensorflow-lstm-ctc-ocr/
354 stars 140 forks source link

环境搭建出现问题,作者你的环境具体是哪些版本? #29

Closed hookover closed 6 years ago

hookover commented 6 years ago

现在的环境:

ubuntu16.04 cuda7.5 cudnn5 tensorflow1.0.1 gtx1060 内存16G

关于环境,这份代码可以运行在tensorflow1.4.0和cuda8.0以上吗?

错误如下

I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Called with args:
Namespace(cfg_file='./lstm/lstm.yml', gpu_id=0, max_iters=700000, network_name='LSTM_train', pre_train=None, randomize=False, restore=0, set_cfgs=None)
CUDA_VISIBLE_DEVICES: 0 CFG.GPU_ID: 0
Using config:
{'CHARSET': '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
 'EXP_DIR': 'lstm_ctc',
 'FONT': 'fonts/Ubuntu-M.ttf',
 'GPU_ID': 0,
 'IMG_SHAPE': [180, 60],
 'LOG_DIR': 'lstm_ctc',
 'MAX_CHAR_LEN': 6,
 'MAX_LEN': 6,
 'MIN_LEN': 4,
 'NCHANNELS': 1,
 'NCLASSES': 64,
 'NET_NAME': 'LSTM',
 'NUM_FEATURES': 60,
 'POOL_SCALE': 2,
 'RNG_SEED': 3,
 'ROOT_DIR': '/srv/python/lstm_ctc_ocr_with_tf_1.0.1',
 'SPACE_INDEX': 0,
 'SPACE_TOKEN': '',
 'TEST': {},
 'TIME_STEP': 90,
 'TRAIN': {'BATCH_SIZE': 32,
           'DISPLAY': 100,
           'GAMMA': 1.0,
           'LEARNING_RATE': 0.001,
           'LOG_IMAGE_ITERS': 100,
           'MOMENTUM': 0.9,
           'NUM_EPOCHS': 2000,
           'NUM_HID': 128,
           'NUM_LAYERS': 2,
           'SNAPSHOT_INFIX': '',
           'SNAPSHOT_ITERS': 2000,
           'SNAPSHOT_PREFIX': 'lstm',
           'SOLVER': 'RMS',
           'STEPSIZE': 2000,
           'WEIGHT_DECAY': 1e-05},
 'VAL': {'BATCH_SIZE': 128,
         'NUM_EPOCHS': 1000,
         'PRINT_NUM': 5,
         'VAL_STEP': 500}}
Output will be saved to `/srv/python/lstm_ctc_ocr_with_tf_1.0.1/output/lstm_ctc`
Logs will be saved to `/srv/python/lstm_ctc_ocr_with_tf_1.0.1/logs/lstm_ctc/lstm_train/2017-11-11-12-25-00`
/gpu:0
Tensor("data:0", shape=(?, ?, 60), dtype=float32)
Tensor("conv4/BiasAdd:0", shape=(?, ?, 30, 1), dtype=float32)
Tensor("time_step_len:0", shape=(?,), dtype=int32)
Use network `LSTM_train` in training
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 1060 6GB
major: 6 minor: 1 memoryClockRate (GHz) 1.759
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 24.38MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.34G (5729727232 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.80G (5156754432 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 4.32G (4641078784 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.89G (4176970752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.50G (3759273472 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.15G (3383345920 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 2.83G (3045011200 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 2.55G (2740509952 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 2.30G (2466458880 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 2.07G (2219812864 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 1.86G (1997831680 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 1.67G (1798048512 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 1.51G (1618243584 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 1.36G (1456419328 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 1.22G (1310777344 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 1.10G (1179699712 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 1012.54M (1061729792 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 911.29M (955556864 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 820.16M (860001280 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 738.14M (774001152 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 664.33M (696601088 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 597.90M (626941184 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 538.11M (564247040 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 484.30M (507822336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 435.87M (457040128 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 392.28M (411336192 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 353.05M (370202624 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 317.75M (333182464 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 285.97M (299864320 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 257.38M (269878016 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 231.64M (242890240 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 208.47M (218601216 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 187.63M (196741120 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 168.86M (177067008 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 151.98M (159360512 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 136.78M (143424512 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 123.10M (129082112 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 110.79M (116174080 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 99.71M (104556800 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 89.74M (94101248 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 80.77M (84691200 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 72.69M (76222208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 65.42M (68600064 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 58.88M (61740288 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 52.99M (55566336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 47.69M (50009856 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 42.92M (45008896 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 38.63M (40508160 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 34.77M (36457472 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 31.29M (32811776 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 28.16M (29530624 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 25.35M (26577664 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 22.81M (23920128 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 20.53M (21528320 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 18.48M (19375616 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 16.63M (17438208 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
done
Solving...
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.32G (5715601920 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.32G (5715601920 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.32G (5715601920 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 5.32G (5715601920 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384):     Total Chunks: 1, Chunks in use: 0 26.2KiB allocated for chunks. 4B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768):     Total Chunks: 1, Chunks in use: 0 36.5KiB allocated for chunks. 512.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144):    Total Chunks: 1, Chunks in use: 0 324.0KiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288):    Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304):   Total Chunks: 1, Chunks in use: 0 5.84MiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608):   Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456):     Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 37.50MiB was 32.00MiB, Chunk State: 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208400000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208400500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208400600 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208400700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208400800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208400900 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208400a00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208400b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208400c00 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208401000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208401100 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208401500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208401600 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208401e00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208401f00 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208402700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208402800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208402900 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208402a00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208402b00 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208403000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208403500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208403600 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208403700 of size 73728
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208415700 of size 73728
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208427700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208427800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208427900 of size 2304
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208428200 of size 2304
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208428b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208428c00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208428d00 of size 96256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208440500 of size 96256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208457d00 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208458100 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208458500 of size 524288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102084d8500 of size 524288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208558500 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208558d00 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208559500 of size 32768
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208561500 of size 32768
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208569500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208569600 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208569700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208569800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208569900 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208569a00 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020856a200 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020856aa00 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020856b200 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020856ba00 of size 32768
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208573a00 of size 32768
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020857ba00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020857bb00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020857bc00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020857bd00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020857be00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020857bf00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020857c000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020857c500 of size 2304
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020857ce00 of size 32768
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020858e000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020858e100 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020858e200 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020858e300 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020858e400 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020858e500 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020858ea00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020858eb00 of size 96256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102085a6300 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102085a6700 of size 96256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102085bdf00 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102085be300 of size 96256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102085d5b00 of size 96256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020863e300 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020863eb00 of size 524288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086beb00 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086bf300 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086bfb00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086bfc00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086bfd00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086bfe00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086bff00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086c0000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086c0100 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086c0200 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086c0300 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086c0400 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086c0500 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086c7300 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086c7400 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086c7500 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086c7a00 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086c7f00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086c8000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086c8100 of size 73728
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086da100 of size 73728
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086ec100 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086ec200 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086ec300 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086ec400 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086ec500 of size 2304
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086ece00 of size 2304
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086ed700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086ed800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102086ed900 of size 96256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208705100 of size 96256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020871c900 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020871cd00 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020871d100 of size 96256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208734900 of size 96256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020874c100 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020874c500 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020874c900 of size 524288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102087cc900 of size 524288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020884c900 of size 524288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102088cc900 of size 524288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1020894c900 of size 695552
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102089f6600 of size 524288
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10208a76600 of size 1228800
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x10208584e00 of size 37376
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x102085ed300 of size 331776
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x102086c0a00 of size 26880
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x10208ba2600 of size 6120192
I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size: 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 57 Chunks of size 256 totalling 14.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 10 Chunks of size 1024 totalling 10.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 8 Chunks of size 1280 totalling 10.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 11 Chunks of size 2048 totalling 22.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 5 Chunks of size 2304 totalling 11.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 5 Chunks of size 32768 totalling 160.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 73728 totalling 288.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 10 Chunks of size 96256 totalling 940.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 8 Chunks of size 524288 totalling 4.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 695552 totalling 679.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1228800 totalling 1.17MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 7.26MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: 
Limit:                  5729727283
InUse:                     7609088
MaxInUse:                  7609600
NumAllocs:                     146
MaxAllocSize:              1228800

W tensorflow/core/common_runtime/bfc_allocator.cc:274] ***************_*****************************************___________________________________________
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 37.50MiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:993] Resource exhausted: OOM when allocating tensor with shape[32,160,60,32]
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1022, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
    status, run_metadata)
  File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,160,60,32]
     [[Node: conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](ExpandDims, conv1/weights/read)]]
     [[Node: logits/bidirectional_rnn/bw/bw/All/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_615_logits/bidirectional_rnn/bw/bw/All", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./lstm/train_net.py", line 89, in <module>
    restore=bool(int(args.restore)))
  File "./lstm/../lib/lstm/train.py", line 190, in train_net
    sw.train_model(sess, max_iters, restore=restore)
  File "./lstm/../lib/lstm/train.py", line 148, in train_model
    ctc_loss,summary_str, _ =  sess.run(fetches=fetch_list, feed_dict=feed_dict)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,160,60,32]
     [[Node: conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](ExpandDims, conv1/weights/read)]]
     [[Node: logits/bidirectional_rnn/bw/bw/All/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_615_logits/bidirectional_rnn/bw/bw/All", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op 'conv1/Conv2D', defined at:
  File "./lstm/train_net.py", line 81, in <module>
    network = get_network(args.network_name)
  File "./lstm/../lib/networks/factory.py", line 17, in get_network
    return LSTM_train()
  File "./lstm/../lib/networks/LSTM_train.py", line 20, in __init__
    self.setup()
  File "./lstm/../lib/networks/LSTM_train.py", line 24, in setup
    .conv_single(3, 3, 32 ,1, 1, name='conv1',c_i=cfg.NCHANNELS)
  File "./lstm/../lib/networks/network.py", line 31, in layer_decorated
    layer_output = op(self, layer_input, *args, **kwargs)
  File "./lstm/../lib/networks/network.py", line 173, in conv_single
    conv = convolve(input, kernel)
  File "./lstm/../lib/networks/network.py", line 165, in <lambda>
    convolve = lambda i, k: tf.nn.conv2d(i, k, [1,s_h, s_w, 1], padding=padding)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 396, in conv2d
    data_format=data_format, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
    self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[32,160,60,32]
     [[Node: conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](ExpandDims, conv1/weights/read)]]
     [[Node: logits/bidirectional_rnn/bw/bw/All/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_615_logits/bidirectional_rnn/bw/bw/All", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
ilovin commented 6 years ago

显存不够,减少batch size

hookover commented 6 years ago

@ilovin 应该不是显存不够,显存有6,只使用了100多M,今天我又运行,错误不一样了: 应该是环境没装好,但网上的方法基本都试了一遍,还没找到解决方案

W tensorflow/core/framework/op_kernel.cc:993] Internal: warp_ctc error in compute_ctc_loss: execution failed
     [[Node: WarpCTC = WarpCTC[blank_label=0, _device="/job:localhost/replica:0/task:0/gpu:0"](logits/transpose_1, _recv_labels_0/_41, _recv_labels_len_0/_43, _recv_time_step_len_0/_9)]]
W tensorflow/core/framework/op_kernel.cc:993] Internal: warp_ctc error in compute_ctc_loss: execution failed
     [[Node: WarpCTC = WarpCTC[blank_label=0, _device="/job:localhost/replica:0/task:0/gpu:0"](logits/transpose_1, _recv_labels_0/_41, _recv_labels_len_0/_43, _recv_time_step_len_0/_9)]]
W tensorflow/core/framework/op_kernel.cc:993] Internal: warp_ctc error in compute_ctc_loss: execution failed
     [[Node: WarpCTC = WarpCTC[blank_label=0, _device="/job:localhost/replica:0/task:0/gpu:0"](logits/transpose_1, _recv_labels_0/_41, _recv_labels_len_0/_43, _recv_time_step_len_0/_9)]]
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 6396 get requests, put_count=3511 evicted_count=1000 eviction_rate=0.284819 and unsatisfied allocation rate=0.623046
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1022, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
    status, run_metadata)
  File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: warp_ctc error in compute_ctc_loss: execution failed
     [[Node: WarpCTC = WarpCTC[blank_label=0, _device="/job:localhost/replica:0/task:0/gpu:0"](logits/transpose_1, _recv_labels_0/_41, _recv_labels_len_0/_43, _recv_time_step_len_0/_9)]]
     [[Node: RMSProp/update/_54 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_1819_RMSProp/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./lstm/train_net.py", line 89, in <module>
    restore=bool(int(args.restore)))
  File "./lstm/../lib/lstm/train.py", line 190, in train_net
    sw.train_model(sess, max_iters, restore=restore)
  File "./lstm/../lib/lstm/train.py", line 148, in train_model
    ctc_loss,summary_str, _ =  sess.run(fetches=fetch_list, feed_dict=feed_dict)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: warp_ctc error in compute_ctc_loss: execution failed
     [[Node: WarpCTC = WarpCTC[blank_label=0, _device="/job:localhost/replica:0/task:0/gpu:0"](logits/transpose_1, _recv_labels_0/_41, _recv_labels_len_0/_43, _recv_time_step_len_0/_9)]]
     [[Node: RMSProp/update/_54 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_1819_RMSProp/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op 'WarpCTC', defined at:
  File "./lstm/train_net.py", line 89, in <module>
    restore=bool(int(args.restore)))
  File "./lstm/../lib/lstm/train.py", line 190, in train_net
    sw.train_model(sess, max_iters, restore=restore)
  File "./lstm/../lib/lstm/train.py", line 79, in train_model
    loss, dense_decoded = self.net.build_loss()
  File "./lstm/../lib/networks/network.py", line 637, in build_loss
    label_lengths=label_len,input_lengths=time_step_batch)
  File "/usr/local/lib/python3.5/dist-packages/warpctc_tensorflow-0.1-py3.5-linux-x86_64.egg/warpctc_tensorflow/__init__.py", line 43, in ctc
    input_lengths, blank_label)
  File "<string>", line 45, in warp_ctc
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
    self._traceback = _extract_stack()

InternalError (see above for traceback): warp_ctc error in compute_ctc_loss: execution failed
     [[Node: WarpCTC = WarpCTC[blank_label=0, _device="/job:localhost/replica:0/task:0/gpu:0"](logits/transpose_1, _recv_labels_0/_41, _recv_labels_len_0/_43, _recv_time_step_len_0/_9)]]
     [[Node: RMSProp/update/_54 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_1819_RMSProp/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
huiyang865 commented 6 years ago

@hookover 你那边的问题解决了么,我也遇到了类似的错误了.

hookover commented 6 years ago

@huiyang865 没解决,没有继续测试他这个了

hookover commented 6 years ago

@huiyang865 如果你有解决麻烦告知

ilovin commented 6 years ago

my environment ubuntu 14.04 cuda 8.0 cudnn 6.0 tf 1.3.0 python 3.5 warpctc master

ilovin commented 6 years ago

I'm closing this issue because it has been inactive for more than one month