Greetings! I took Vector Addition tutorial and added a couple of other benchmarks (for numpy and tensorflow), and the benchmark results don't seem accurate anymore - the graph and the dataframe show that tensorflow is ~1k times faster that torch and triton implementations, thought performance testing with %%timeit gives the same results:
%%timeit
with tf.device('/GPU:0'):
x = tf.random.uniform(shape=(268435456,), dtype=tf.float32)
y = tf.random.uniform(shape=(268435456,), dtype=tf.float32)
tf.add(x, y)
# 22.8 ms ± 9.69 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
x = torch.rand(268435456, device='cuda', dtype=torch.float32)
y = torch.rand(268435456, device='cuda', dtype=torch.float32)
torch.add(x, y)
# 22.9 ms ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Greetings! I took Vector Addition tutorial and added a couple of other benchmarks (for numpy and tensorflow), and the benchmark results don't seem accurate anymore - the graph and the dataframe show that tensorflow is ~1k times faster that torch and triton implementations, thought performance testing with
%%timeit
gives the same results:Results of
benchmark.run
:Complete code is here - https://colab.research.google.com/drive/16HF6k5wGoqfnDg0uqHKe_B0vXTpzPDfn?usp=sharing