microsoft / tensorflow-directml

Fork of TensorFlow accelerated by DirectML
Apache License 2.0
459 stars 32 forks source link

LSTM training is super slow on GPU #34

Open phgilde opened 4 years ago

phgilde commented 4 years ago

This training loop takes more than a second per epoch using tensorflow-directml but a fraction of a second with standard tensorflow. It actually doesnt work at all (error is NaN after a couple of iterations) but I already opened another Issue for that.

Code:

import tensorflow as tf
import numpy as np
from tensorflow import keras
import matplotlib.pyplot as plt
import time
from datetime import timedelta

def fn(x):
    return tf.sin(x)

seq_length = 200
x = tf.linspace(tf.constant(0, dtype=tf.float32), 50, seq_length)
y = fn(x)

n_outputs = 50
model = keras.layers.LSTM(n_outputs, return_sequences=True)
optimizer = keras.optimizers.Adam(learning_rate=1e-3)
loss_fn = keras.losses.MSE

loss_history = []
epochs = 2_000
out_epochs = 10
start = time.time()
for epoch in range(epochs):
    with tf.GradientTape() as tape:
        y_pred = model(tf.zeros(shape=(1, seq_length, 1)))
        y_pred_data = y_pred[0, :, 0]
        loss = loss_fn(y, y_pred_data)
    loss_history.append(loss.numpy())
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    if epoch % out_epochs == 0:
        print(f"Epoch {epoch}: Loss = {loss} ({timedelta(seconds=time.time()-start)})")

System: Intel i5-7200U with Intel HD graphics 620

PatriceVignola commented 4 years ago

Thank you for reporting this @phgilde . Are you running this script on Windows or WSL?

phgilde commented 4 years ago

@PatriceVignola I'm running this on windows

jstoecker commented 4 years ago

We've implemented the single-step/block-based LSTM/GRU/RNN ops, but these are really better suited to CPU architectures. Models typically use the multi-step cuDNN ops when executing on a GPU device. It's not unsurprising that there's some more work here to make DML perform better with recurrent networks.

wchao1115 commented 4 years ago

@phgilde What GPU you're running this with? You mentioned standard tensorflow and that your config is with Intel HD graphics. Is this training script running on CPU?

ghostlypi commented 3 years ago

I've had the same issue on an RX 560. In task manager neither the GPU nor the CPU seems to take on any load. image

onurberkay commented 2 years ago

I have same problem with 4750u amd apu , also gpu load not even %1-2

PatriceVignola commented 2 years ago

@onurberkay What does tf.config.list_physical_devices() give you?