hughperkins / tf-coriander

OpenCL 1.2 implementation for Tensorflow
Apache License 2.0
791 stars 90 forks source link

Does it support multi-GPU setup? #43

Closed 0b01 closed 7 years ago

hughperkins commented 7 years ago

I dont know :-) I havent tested. I suspect it theoretically should, but might have one or two blocking bugs...

0b01 commented 7 years ago

Here is the output of my current running program.

➜  tensorflow-seq2seq-prediction git:(master) ✗ python3 seq2seq.py
Dimensions of the dataset for 3 X and 3 Y training examples : 
(40, 3, 2)
(40, 3, 2)
(seq_length, batch_size, output_dim)
TensorFlow's version : 0.12
OpenCL platform: AMD Accelerated Parallel Processing
OpenCL device: Hawaii
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties: 
name: Hawaii
major: -1 minor: -1 memoryClockRate (GHz) 1040
pciBusID 0000.0000
Total memory: 3.09GiB
Free memory: 2.26GiB
W tensorflow/stream_executor/cl/cl_driver.cc:587] creating context when one is currently active; existing: �4^
OpenCL platform: AMD Accelerated Parallel Processing
OpenCL device: Hawaii
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 1 with properties: 
name: Hawaii
major: -1 minor: -1 memoryClockRate (GHz) 1040
pciBusID 0000.0000
Total memory: 3.09GiB
Free memory: 2.26GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 1 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0 1 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0:   N N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 1:   N N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Hawaii, pci bus id: 0000.0000)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Hawaii, pci bus id: 0000.0000)
cl_driver DeviceAllocate 2110746624
cl_driver DeviceAllocate 2110746624
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 6655 get requests, put_count=1357 evicted_count=1000 eviction_rate=0.73692 and unsatisfied allocation rate=0.961382
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:257] Raising pool_size_limit_ from 100 to 110
Step 0/100, train loss: 38526.41796875,     TEST loss: 2405405.75
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=4011 evicted_count=4000 eviction_rate=0.997258 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=2013 evicted_count=2000 eviction_rate=0.993542 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=5014 evicted_count=5000 eviction_rate=0.997208 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=3017 evicted_count=3000 eviction_rate=0.994365 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=1021 evicted_count=1000 eviction_rate=0.979432 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=4023 evicted_count=4000 eviction_rate=0.994283 and unsatisfied allocation rate=0
Step 10/100, train loss: 17055.38671875,    TEST loss: 7748.56689453125
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=2028 evicted_count=2000 eviction_rate=0.986193 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 6655 get requests, put_count=6626 evicted_count=6000 eviction_rate=0.905524 and unsatisfied allocation rate=0.910443
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:257] Raising pool_size_limit_ from 339 to 372
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=4037 evicted_count=4000 eviction_rate=0.990835 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=2044 evicted_count=2000 eviction_rate=0.978474 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 6655 get requests, put_count=6848 evicted_count=6000 eviction_rate=0.876168 and unsatisfied allocation rate=0.87994
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:257] Raising pool_size_limit_ from 542 to 596
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=4059 evicted_count=4000 eviction_rate=0.985464 and unsatisfied allocation rate=0
Step 20/100, train loss: 15100.982421875,   TEST loss: 17128.3828125
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=3072 evicted_count=3000 eviction_rate=0.976562 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=2087 evicted_count=2000 eviction_rate=0.958313 and unsatisfied allocation rate=0
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 0 get requests, put_count=1105 evicted_count=1000 eviction_rate=0.904977 and unsatisfied allocation rate=0

Great work btw.

hughperkins commented 7 years ago

Cool. Looks like it's running, and loss is decreasing?

0b01 commented 7 years ago

Loss is not decreasing. And there are huge spikes compared to CPU

https://m.imgur.com/uOpGRH4?r

hughperkins commented 7 years ago

Ok. Can you confirm that if you use a single gpu it works ok-ish?

hughperkins commented 7 years ago

(if the code is publically available, or based on publically available code, do you mind posting a link to that, so I can try it sometime? Not right now, since I'm trying to get split working, but soon-ish)

0b01 commented 7 years ago

The code is based on seq2seq model.

https://github.com/guillaume-chevalier/seq2seq-signal-prediction

0b01 commented 7 years ago

@hughperkins, looks like it's working. Congrats!

hughperkins commented 7 years ago

Awesome! :-)