yaroslavvb / tensorflow-community-wheels

Place to upload links to TensorFlow wheels
453 stars 35 forks source link

AMD Ryzen Threadripper? #40

Open madneon opened 7 years ago

madneon commented 7 years ago

Can anyone please share the knowledge how to build optimized TF for Ryzen Threadrippers?

yaroslavvb commented 7 years ago

AMD GPUs are not supported - for neural nets in general you need cudnn which is NVidia only

madneon commented 7 years ago

@yaroslavvb I mean compilation for AMD CPUs.

yaroslavvb commented 7 years ago

oops, my bad...I guess the standard -march=native flag should work

madneon commented 7 years ago

OK, thank you! I will try the build in the next few days.

PiotrCzapla commented 7 years ago

@madneon have you managed to compile tensorflow for Threadripper. If so have you run an benchmarks? I'm considering buying ThreadRipper as a CPU for tensorflow workstation and I'm wondering how fast can it be. Could you run on your optimised build python tensorflow/examples/tutorials/layers/cnn_mnist.py

I've done some test on my iMac 2014 (Core i5-4690, 3.5GHz), cnn_mnist takes 40min on optimised tensorflow build, the stock build is 30% slower, training takes 1h.

If you don't want to wait one hour run it for 10 min and look for this lines:

I'm on holidays next week but I would be really obliged if you could share your findings

madneon commented 7 years ago

@PiotrCzapla I'm still waiting for my 1950X to arrive...

tmorgan4 commented 6 years ago

Anyone having luck running AMD processors on CPU heavy workloads yet? I'm hesitant to switch from Intel (due to the loss of MKL, AVX, etc.) but so far running any of the Intel optimized builds have done nothing but reduce performance over the non-optimized versions. My networks are primarily reinforcement learning with many workers/environments running in parallel so high core count is extremely important.

yaroslavvb commented 6 years ago

if you run things in parallel, then AVX2 could reduce your performance since each process will run more cores so parallel processes will compete with each other

tmorgan4 commented 6 years ago

@yaroslavvb Thanks for the info. This is the first time I've heard of that so I will do more research into AVX2.

ghost commented 6 years ago

I found this thread while searching for Tensorflow AMD Ryzen benchmarks. I may able to help if you want to build/compile Tensorflow from source on Ubuntu by yourself. Please note i'm using Python 3 (3.6) and Ubuntu 17.10.

ghost commented 6 years ago
PiotrCzapla commented 6 years ago

In the end I went for Intel based hardware and it seems that It was a mistake as my 40% more expensive CPU i7-6850K is about the same speed in this test

> CUDA_VISIBLE_DEVICES='' python tensorflow/examples/tutorials/layers/cnn_mnist.py 2>&1 | grep global_step 
INFO:tensorflow:global_step/sec: 11.1653
INFO:tensorflow:global_step/sec: 11.1838
INFO:tensorflow:global_step/sec: 11.1302
mohnkhan commented 6 years ago

Buying a GPU would speed it up further...
If Thread-ripper is water cooled it works fine. When possible will post few quick bench 1950x gives roughly 2x performance of R7 1800x and also that this architecture is highly sensitive to RAM speed. about 29xx speed RAM offers best performance

Daniel451 commented 6 years ago

Just some short benchmarks:

Ryzen 7 2700X (16 Threads): INFO:tensorflow:global_step/sec: 12.7574 (averaged between 12-13) unoptimized(!) TF CPU-version.

Ryzen 7 2700X (16 Threads): INFO:tensorflow:global_step/sec: 15.1131 (averaged between 12-13) -march=native TF custom CPU build.

NVIDIA GeForce GTX 1080: INFO:tensorflow:global_step/sec: 255.1756 (something between 230 to 270)

NVIDIA TITAN X (Pascal): INFO:tensorflow:global_step/sec: 230.6915 (something between 210 to 250)

You can easily see that high-end GPUs deliver approx. 20x the performance of a 2700X, which is quite fast itself with 16 Threads hitting ~12.5 steps/sec.

However, this test is not appropriate for general performance comparisons since cnn_mnist.py is a very tiny network compared to "real" deep learning models. This means that, for example, the GPU utilization of a TITAN X is <= 50% most of the time. My 1080 did not went above 55%. Even the Ryzen 7 2700X CPU barely went over 90% usage of 16 Threads.

This is also the reason why the 1080 is faster than the TITAN X here: not all CUDA cores can be used efficiently with such tiny networks and the 1080 simply has higher clock speeds (up to 2 GHz for the 1080; about 1.65 GHz for the TITAN X).

riaz commented 5 years ago

I found this thread while searching for Tensorflow AMD Ryzen benchmarks. I may able to help if you want to build/compile Tensorflow from source on Ubuntu by yourself. Please note i'm using Python 3 (3.6) and Ubuntu 17.10.

I found this thread while searching for Tensorflow AMD Ryzen benchmarks. I may able to help if you want to build/compile Tensorflow from source on Ubuntu by yourself. Please note i'm using Python 3 (3.6) and Ubuntu 17.10.

@ghost can you help with setting up tensorflow from source on AMD TR 1950x

harishprabhala commented 5 years ago

Hi,

I recently bought a 1950X and I am trying to train a dataset on just the CPU, with tensorflow. When I compare it with my MacBook core i7 8th gen and hexacore, the training time is 10X slower on 1950X. I can't figure out what the issue is. Please help.

Thanks, H

mohnkhan commented 5 years ago

@harishprabhala A little more info about your Rig (hardware, software, OS etc) would help to start with. Going blind (guessing) did you check whats the bottleneck ?

Th3OnlyN00b commented 4 years ago

Wait so how exactly are you supposed to build tensorflow for AMD threadripper with an NVidia GPU? I don't really understand why this is required, as the threadripper 1950x should support avx correctly, no?