numpy.take gives corrupted result with inplace operation (Trac #2122)

numpy-gitbot commented 12 years ago

Original ticket http://projects.scipy.org/numpy/ticket/2122 on 2012-04-29 by atmention:fengy-research, assigned to unknown.

Take doesn't work as expected when the output is inplace.

When mode='clip' or mode='wrap', the operation is truly in-place, and the simple take algorithm currently implemented will overwrite the array elements before they are memmoved to the correct reindexed positions.

A simple test case is listed below,

In [37]: a = array([0, 1, 2]) In [38]: ind = array([2, 1, 0]) In [39]: b = a.copy() In [40]: a.take(ind, out=a) Out[40]: array([2, 1, 0]) In [41]: a = b.copy() In [42]: a.take(ind, out=a, mode='clip') Out[42]: array([2, 1, 2])

Out[40] is correct, for mode='raise' secretly creates a copy of the output before copying. But Out[42] is in-correct. No error or warning is given.

A fix is either to create a copy for the other two modes, or to actually use a permuting algorithm. The latter calls for a full rewrite of PyArray_Take but will save memory usage significantly. A proper implementation is used in GSL: gsl_permute.

numpy-gitbot commented 12 years ago

atmention:dwf wrote on 2012-04-30

I don't think this is a bug. AFAIK numpy.take isn't really intended to be used in this way. Furthermore, because arrays may not be contiguous in memory (e.g. smallest stride > 1 element), determining whether there is any overlap between input and output arrays is quite complicated, and I'm not sure it's practical to do so.

There are lots of places in NumPy where inplace operations will unavoidably fail if you're not careful, e.g. you might think that a = arange(50); a[:] += a[::-1] will return an array of 49s, but it doesn't.

I do agree that take() should at least document that it's not meant to be used when a and out alias each other, and to prefer fancy indexing in this case; ideally, an error would be raised, but I'm not sure that's practical for the reasons above.

numpy-gitbot commented 12 years ago

atmention:dwf wrote on 2012-04-30

N.B.: I think there's room for a library function for the special case of permuting the exact same array in-place; it's definitely a common enough need.

numpy-gitbot commented 12 years ago

atmention:fengy-research wrote on 2012-05-01

I do agree with you. The only reasonable inplace numpy.take appears to be an inplace permute.

The following is gsl_permute ported to numpy. It is what I am using but also too limited
to be in the library as it can only handle 1D continuous arrays. 

static PyObject * permute(PyObject * self,
    PyObject * args, PyObject * kwds) {
    static char * kwlist[] = {
        "array", "index", NULL
    };
    PyArrayObject * array, * index, *out;
    if(!PyArg_ParseTupleAndKeywords(args, kwds,
        "O!O!", kwlist,
        &PyArray_Type, &array,
        &PyArray_Type, &index)) return NULL;

  size_t n = PyArray_SIZE(index);
  size_t * p = PyArray_DATA(index);
  char * data = PyArray_DATA(array);
  int itemsize = PyArray_ITEMSIZE(array);

  size_t i, k, pk;

  for (i = 0; i < n; i++)
    {
      k = p[i];

      while (k > i)
        k = p[k];

      if (k < i)
        continue ;

      /* Now have k == i, i.e the least in its cycle */

      pk = p[k];

      if (pk == i)
        continue ;

      /* shuffle the elements of the cycle */

      {
        unsigned int a;

        char t[itemsize];

        memmove(t, & data[i * itemsize], itemsize);

        while (pk != i)
          {
            memmove(&data[k * itemsize], & data[pk * itemsize], itemsize);
            k = pk;
            pk = p[k];
          };

        memmove(&data[k * itemsize], t, itemsize);
      }
    }
    Py_INCREF(array);
    return (PyObject*) array;
}

thouis / numpy-trac-migration

numpy.take gives corrupted result with inplace operation (Trac #2122) #5918