janestreet / torch

MIT License
100 stars 9 forks source link

Tensor.argmax behaves differently than its Python counterpart #6

Closed darioteixeira closed 9 months ago

darioteixeira commented 12 months ago

I've compiled this repo's current master (93ed82c) using OCaml 4.14.1 on an AMD64 system running (K)ubuntu 23.10. I've used what the README refers to as "Option 3" for installation, ie, using the system libraries. The corresponding distro packages are libtorch-dev (version 2.0.1+dfsg-4) and libtorch2.0:amd64 (version 2.0.1+dfsg-4). Note that therefore the OPAM package libtorch is not installed.

Consider the following Python REPL session (using the distro package python3-torch version 2.0.1+dfsg-4):

>>> import torch
>>> t = torch.tensor([ [ 10, 20, 30 ], [ 60, 50, 40 ] ])
>>> torch.argmax(t, dim=1)
tensor([2, 0])

And a similar session using Utop; note the discrepancy with the invocation of Tensor.argmax:

utop # open Torch;;
utop # let t = Tensor.of_float2 [| [| 10.; 20.; 30. |]; [| 60.; 50.; 40. |] |];;
val t : Tensor.t = 
 10  20  30
 60  50  40
[ CPUFloatType{2,3} ]
utop # Tensor.argmax ~dim:1 t;;
- : Tensor.t = 
3
[ CPULongType{} ]
mwlon commented 11 months ago

Thanks for reporting this. I've released a fix internally, and it will make its way to the public repo at some point.

This was a pretty major bug that had been around for a long time: any optional arguments such as dim passed to libtorch would have the wrong indicator value for null. Reduction on dim x would be interpreted as reduction on all axes, and reduction on no dim would be interpreted as reduction on dim 0.

darioteixeira commented 11 months ago

Thanks, @mwlon! Note that I ran into this problem just by trying out some of the examples in the repo, which use the Dataset_helper.batch_accuracy function, which in turn makes use of Tensor.argmax. May be worth turning some of those examples into unit tests! ;-)

mwlon commented 11 months ago

I did indeed add unit tests :)

Would be nice if we had an automated doctest system too, to verify all the examples work, but I think that's unlikely any time soon.

darioteixeira commented 9 months ago

Is a public release with this bug fixed planned for the near future?

mwlon commented 9 months ago

The current timeline I've heard is early February. Apologies for the delay.

darioteixeira commented 9 months ago

Cool, thanks. Looking forward to testing it out!