gorgonia / tensor

package tensor provides efficient and generic n-dimensional arrays in Go that are useful for machine learning and deep learning purposes
Apache License 2.0
362 stars 49 forks source link

Weird behavior when Iterating over tensors #124

Open StenSipma opened 2 years ago

StenSipma commented 2 years ago

Whilst working with the tensor.Iterator functionality I found some strange behavior, when using iterator in a for loop like follows:

it = t.Iterator()  // t is a tensor.Dense
for i, err := it.Start(); err == nil; i, err = it.Next() {
    fmt.Printf("i = %v, coord = %v\n", i, it.Coord())
}
  1. When using the iterator as stated above, the 'first' element of the tensor is always last in the iteration. This is not really a big issue if you want to iterate over all elements and order does not matter, but it is weird nonetheless. As an example consider the tensor [1, 2, 3, 4] with shape (2, 2). It will visit the elements in order:
    i = 0, coord = [0 1]
    i = 1, coord = [1 0]
    i = 2, coord = [1 1]
    i = 3, coord = [0 0]  <------------ should be first
  2. When iterating over a vector or a scalar, the indices are off by one. Again same tensor, but as a vector (shape (4,)):
    i = 0, coord = [1]
    i = 1, coord = [2]
    i = 2, coord = [3]
    i = 3, coord = [4]  <----------- gives index error when used

    The same thing happens with 'scalar-like' tensors, like: [1] with shape (1, 1, 1):

    i = 0, coord = [1 0 0]  <--------- also index error, should be [0 0 0] always for scalars

    Interestingly, when the tensor is a vector/scalar View of a tensor that is not a vector/scalar (i.e. obtained by slicing), the issue does not happen.

What I found to work properly is to use the following for loop instead, but I don't think this is the intended way of iterating.

it = t.Iterator()
for it.Reset(); !it.Done(); it.Next() {
    ...
}
chewxy commented 2 years ago

So, the .Coord() method is meant to be read before use. The docs do mention that .Coord() returns the next coordinate. This is because an Iterator is a data structure that stores the state. So when you call Next or Start you are changing the state.

I know this is really not ideal. We would love some ideas on how to make this more user friendly. One idea that I had earlier in the design was to not only return the i but also the coords. But it turns out for most activities you only need the index.

StenSipma commented 2 years ago

So, the .Coord() method is meant to be read before use. The docs do mention that .Coord() returns the next coordinate. This is because an Iterator is a data structure that stores the state. So when you call Next or Start you are changing the state.

Ah okay, that is interesting. I guess I must have missed this part of the documentation. Maybe it would be good to then change the example of how to use Iterator (example) to something that is intended?

I know this is really not ideal. We would love some ideas on how to make this more user friendly. One idea that I had earlier in the design was to not only return the i but also the coords. But it turns out for most activities you only need the index.

Perhaps an easy solution would be to add a method which converts the index i to a coordinate instead? Then it would be more an opt-in feature and doesn't hurt those that do not need it. Of course, it is possible to compute this yourself using the Shape but this would be a lot more convenient.

ddkang commented 1 year ago

Having some fix for this would be incredibly useful!