joey00072 / Tinytorch

A really tiny autograd engine
MIT License
85 stars 2 forks source link

Backward pass for matmul gives an error #3

Open RS2007 opened 4 months ago

RS2007 commented 4 months ago

I was toying around with the tiny xor example and I changed the training loop from the current loop:

for idx in range(ITER):
    pred = model(x)
    loss = tt.mse_loss(pred, y)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    print(loss.item())

to:

for idx in range(ITER):
    loss = Tensor([0.0])
    for x1, y1 in zip(x, y):
        pred = model(x1)
        loss += tt.mse_loss(pred, y1)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    print(loss.item())

Semantically its pretty much the same code, but on running the backward pass it gives the following error:

Traceback (most recent call last):
  File "/Users/hedwig/Tinytorch/tiny_xor_net.py", line 52, in <module>
    loss.backward()
  File "/Users/hedwig/Tinytorch/tinytorch.py", line 262, in backward
    grads = node._ctx.op.backward(node._ctx, node.grad)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hedwig/Tinytorch/tinytorch.py", line 382, in backward
    grad_y = transpose_last_axis(x.data) @ grad.data
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 1 is different from 2)

The error, I can hazard a guess is due to not broadcasting the grad and the parent data arrays before the backward pass for the MatMul Function. The shapes obtained in this case (before the matmul fails) is as follows:

Shape of grad.data: 1,)
Shape of x.data.T: (2,)
Shape of y.data.T: (1, 2)
joey00072 commented 4 months ago

Can you can give me x1.shape, y1.shape and pred.shape I'll debug it tonight

( This detailed nicely raised issue I have seen in while nice one)

RS2007 commented 4 months ago

Sure, these are the shapes of x1,y1 and pred:

x1 shape: (2,)
y1 shape: (1,)
pred shape: (4, 1)
davidgonmar commented 4 months ago

Probably happens because you are not handling the matmul backwards correctly. In this example, you do (2) @ (2, 1) -> (1) in shapes, in the forward pass in the matmul. In this situation x behaves like a row vector so the op is like doing (1, 2) @ (2, 1) -> (1, 1) then reducing to (1). What you want to do for the x grad is a matmul of shapes x.T @ grad, which should be (according to the actual behaviour of vector @ matrix in numpy) (2, 1) @ (1) -> (2, 1), that is, the shape of y, but since transposing x is (2).T -> (2) (because of numpy semantics I guess) you are trying to do (2) @ (1) instead. Basically, be careful with vector/matrix multplication and how it behaves (since there is no concept of 'row' or 'column' vector in 1D hehe). So the solution is to handle those cases manually or generalize it or something.

davidgonmar commented 4 months ago

If you don't want to be expanding arrays and stuff, grad_y = x.outer_product(y) on the vect @ matrix case.

RS2007 commented 4 months ago

Actually @davidgonmar , This makes total sense. Just tried outer_product and that works

joey00072 commented 4 months ago

Hey, @RS2007 if change is minimal can you raise pr.

Thanks @davidgonmar @RS2007 (sorry i'm kind of busy this fews days)

RS2007 commented 4 months ago

Yup sure