Closed 0b01 closed 7 years ago
Here is the output of my current running program.
➜ tensorflow-seq2seq-prediction git:(master) ✗ python3 seq2seq.py
Dimensions of the dataset for 3 X and 3 Y training examples :
(40, 3, 2)
(40, 3, 2)
(seq_length, batch_size, output_dim)
TensorFlow's version : 0.12
OpenCL platform: AMD Accelerated Parallel Processing
OpenCL device: Hawaii
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties:
name: Hawaii
major: -1 minor: -1 memoryClockRate (GHz) 1040
pciBusID 0000.0000
Total memory: 3.09GiB
Free memory: 2.26GiB
W tensorflow/stream_executor/cl/cl_driver.cc:587] creating context when one is currently active; existing: �4^
OpenCL platform: AMD Accelerated Parallel Processing
OpenCL device: Hawaii
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 1 with properties:
name: Hawaii
major: -1 minor: -1 memoryClockRate (GHz) 1040
pciBusID 0000.0000
Total memory: 3.09GiB
Free memory: 2.26GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 1 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0: N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 1: N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Hawaii, pci bus id: 0000.0000)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Hawaii, pci bus id: 0000.0000)
cl_driver DeviceAllocate 2110746624
cl_driver DeviceAllocate 2110746624
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 6655 get requests, put_count=1357 evicted_count=1000 eviction_rate=0.73692 and unsatisfied allocation rate=0.961382
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:257] Raising pool_size_limit_ from 100 to 110
Step 0/100, train loss: 38526.41796875, TEST loss: 2405405.75
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=4011 evicted_count=4000 eviction_rate=0.997258 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=2013 evicted_count=2000 eviction_rate=0.993542 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=5014 evicted_count=5000 eviction_rate=0.997208 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=3017 evicted_count=3000 eviction_rate=0.994365 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=1021 evicted_count=1000 eviction_rate=0.979432 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=4023 evicted_count=4000 eviction_rate=0.994283 and unsatisfied allocation rate=0
Step 10/100, train loss: 17055.38671875, TEST loss: 7748.56689453125
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=2028 evicted_count=2000 eviction_rate=0.986193 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 6655 get requests, put_count=6626 evicted_count=6000 eviction_rate=0.905524 and unsatisfied allocation rate=0.910443
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:257] Raising pool_size_limit_ from 339 to 372
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=4037 evicted_count=4000 eviction_rate=0.990835 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=2044 evicted_count=2000 eviction_rate=0.978474 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 6655 get requests, put_count=6848 evicted_count=6000 eviction_rate=0.876168 and unsatisfied allocation rate=0.87994
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:257] Raising pool_size_limit_ from 542 to 596
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=4059 evicted_count=4000 eviction_rate=0.985464 and unsatisfied allocation rate=0
Step 20/100, train loss: 15100.982421875, TEST loss: 17128.3828125
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=3072 evicted_count=3000 eviction_rate=0.976562 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=2087 evicted_count=2000 eviction_rate=0.958313 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=1105 evicted_count=1000 eviction_rate=0.904977 and unsatisfied allocation rate=0
Great work btw.
Cool. Looks like it's running, and loss is decreasing?
Loss is not decreasing. And there are huge spikes compared to CPU
Ok. Can you confirm that if you use a single gpu it works ok-ish?
(if the code is publically available, or based on publically available code, do you mind posting a link to that, so I can try it sometime? Not right now, since I'm trying to get split
working, but soon-ish)
The code is based on seq2seq model.
https://github.com/guillaume-chevalier/seq2seq-signal-prediction
@hughperkins, looks like it's working. Congrats!
Awesome! :-)
I dont know :-) I havent tested. I suspect it theoretically should, but might have one or two blocking bugs...