tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.35k stars 1.92k forks source link

100% performance improvement MatMul #3315

Open TheNewSound opened 4 years ago

TheNewSound commented 4 years ago

Describe the problem or feature request

When executingconst result = tf.matMul(a,b,transposeA,transposeB); and the following holds: a==b && (transposeA ^ transposeB) (where ^ is the XOR-operator) Then tf.matMul only has to calculate the upper triangle and can mirror those values to the lower triangle of the resulting matrix or vice-versa.

const a = tf.tensor2d([1, 2, 3, 4, 5, 6, 7, 8, 9], [3, 3]);

a.matMul(a, true).print();
a.matMul(a, false, true).print(); 

I see the WebAssembly back-end currently does not take advantage of this property? Maybe this counts for other back-ends aswell? For example CUDA has cublas<t>syrk(), don't know if Tensorflow takes advantage of this in CUDA...

gaikwadrahul8 commented 1 year ago

Hi, @TheNewSound

Apologize for the delayed response and I tried to replicate the same issue from my end with wasm backend and I'm getting below output with latest version of @tensorflow/tfjs@4.10.0 so May I know have we taken care of this issue ?

image

Thank you for opening this issue. Since this issue has been open for a long time, the code/debug information for this issue may not be relevant with the current state of the code base.

The TFJs team is constantly improving the framework by fixing bugs and adding new features. We suggest you try the latest TFJs version with the latest compatible hardware configuration which could potentially resolve the issue. If you are still facing the issue, please create a new GitHub issue with your latest findings, with all the debugging information which could help us investigate.

Please follow the release notes to stay up to date with the latest developments which are happening in the Tensorflow.js space.

Thank you for your support and cooperation.

TheNewSound commented 12 months ago

What I described was a performance optimization suggestion/question.