raskr / rust-autograd

Tensors and differentiable operations (like TensorFlow) in Rust
MIT License
487 stars 37 forks source link

Gradient error for tensor of different dimensions #44

Closed laocaoshilaocao closed 3 years ago

laocaoshilaocao commented 3 years ago

Hi, during the development of my neural network algorithm, i found that gradient method for tensors calculated between two different dimensions always has an error. The error includes thread 'main' panicked at'called Result::unwrap() on an Err value: and thread 'main' panicked at'called Result::unwrap() on an None value: if i try to print the gradresult. One example is like:

let t1  = g.constant( array![[1.0,2.0, 3.0, 4.0], [1.0,2.0, 3.0, 4.0]] );
let t2  = g.constant( array![1.0,2.0, 3.0, 4.0] );
let cal = t1/t2
let grads = g.grad(&[cal], &[t2]);

That influences a lot for the expression using during the development. After testing, i found that expand t2's dimension can solve that problem by doing like this.

...
let t3 = g.tile(g.expand_dims(t2, &[0]), 0, 2);
let cal = t1/t3
...

However, that way definitely requires much more effort for the development since method like reduced_sum is really commonly used. Do u have any other idea about solving this problem?

raskr commented 3 years ago

expand_dims() is required because autograd (more like ndarray) doesn't perform implicit broadcast. Is tile() required?

laocaoshilaocao commented 3 years ago

You are right the tile() is not required between 2D and 1D. But i think that is required when the calculation is between 3D and 2D (which is my situation). The example for me is like:

g.expand_dims([4x5], &[1]) - [2x5]

So the first part need to be tiled through axis 1 from 4x1x5 to 4x2x5, and part 2 need to be tiled through axis 0 from 1x2x5 to 4x2x5

Btw the computation process works fine, but the gradient process just doesn't without dimension expand.

raskr commented 3 years ago

But i think that is required when the calculation is between 3D and 2D

Do you expect 2D -> 3D broadcast? Does TF support it??

laocaoshilaocao commented 3 years ago

Do you expect 2D -> 3D broadcast? Does TF support it??

Yes i think TF supports that. Example is like:

hidden = tf.constant([[1.0,2.0], [1.0,2.0], [1.0,2.0],[1.0,2.0]])
clusters = tf.constant([[1.0,1.0]])
dist1 = K.expand_dims(hidden, axis=1) - clusters
raskr commented 3 years ago

Hmm in the above example implicit broadcast (1,2) -> (4,1,2) looks occurring but autograd and ndarray requires one more empty dim. (tile() is not needed)

// (1,2) -> (1, 1, 2)
expand_dims(clusters, axis=0)

I think implicit broadcast is evil and the ndarray's spec is reasonable:thinking:

laocaoshilaocao commented 3 years ago

Hmm in the above example implicit broadcast (1,2) -> (4,1,2) looks occurring but autograd and ndarray requires one more empty dim. (tile() is not needed)

haha okay i got that. You are right the implicit broadcasting makes the whole code quite mess to read. :)

laocaoshilaocao commented 3 years ago

(tile() is not needed)

But when i met this same situation, tile() is still needed i believe?

let points: Array2<f64> = array![[1.0,2.0], [1.0,4.0], [1.0,4.0], [10.0,2.0],[10.0,4.0],[10.0,0.0]];
let points_t: ag::Tensor<f64> = g.constant(points);

let centroids: Array2<f64> = array![[0.0,0.0], [1.0,1.0]];
let centroids_t: ag::Tensor<f64> = g.constant(centroids);

let points_expanded = g.expand_dims(points_t, &[0]);
let centroids_expanded = g.expand_dims(centroids_t, &[1]);

let t = points_expanded - centroids_expanded

If i run this i got the error like thread 'main' panicked at 'ndarray: could not broadcast array from shape: [2, 1, 2] to: [1, 6, 2]'