nengo / nengo-ocl

OpenCL-based simulator for Nengo neural models
Other
23 stars 14 forks source link

-1 in A stride1s in GEMV #118

Closed hunse closed 7 years ago

hunse commented 8 years ago

This is a strange bug that only seems to happen in Python 2, and only on some machines. It's been happening for a while, at least since the last release. I just never noticed it since it doesn't happen on my work computer.

Example error:

===================================================== FAILURES =====================================================
____________________________ test.nengo.networks.tests.test_actionselection.test_basic _____________________________

Simulator = <class 'nengo_ocl.tests.conftest.OclSimulator'>
plt = <nengo.utils.testing.Mock object at 0x7f57cd263f90>, seed = 1675708018

    def test_basic(Simulator, plt, seed):
        bg = nengo.networks.BasalGanglia(
            dimensions=5, net=nengo.Network(seed=seed))
        with bg:
            input = nengo.Node([0.8, 0.4, 0.4, 0.4, 0.4], label="input")
            nengo.Connection(input, bg.input, synapse=None)
            p = nengo.Probe(bg.output, synapse=0.01)

>       with Simulator(bg) as sim:

../nengo/nengo/networks/tests/test_actionselection.py:14: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
nengo_ocl/tests/conftest.py:13: in __init__
    super(OclSimulator, self).__init__(*args, context=ctx, **kwargs)
nengo_ocl/simulator.py:395: in __init__
    self._plan.extend(self.plan_op_group(op_type, op_list))
nengo_ocl/simulator.py:416: in plan_op_group
    return getattr(self, 'plan_' + op_type.__name__)(ops)
nengo_ocl/simulator.py:439: in plan_MultiDotInc
    tag='c-beta-%d' % len(constant_bs))
nengo_ocl/simulator.py:464: in _sig_gemv
    Y_in=Y_in, gamma=gamma, tag=tag)
nengo_ocl/clra_gemv.py:119: in __init__
    self.plans = self.choose_plans()
nengo_ocl/clra_gemv.py:1146: in choose_plans
    return block_impl(self, list(range(len(self.Y))))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

p = <nengo_ocl.clra_gemv.plan_block_gemv object at 0x7f57c208c550>, items = [0, 1, 2, 3, 4, 5, ...]

    def block_impl(p, items):

        if p.clra_alpha is not None:
            raise NotImplementedError()
        if p.clra_gamma is not None:
            raise NotImplementedError()
        if p.clra_beta is not None:
            raise NotImplementedError()
        if p.cl_alpha is not None:
            raise NotImplementedError()
        if p.cl_beta is not None:
            raise NotImplementedError()
        if p.cl_gamma is not None:
            raise NotImplementedError()
        if not all(s == 1 for s in p.A.stride1s):
>           raise NotImplementedError()
E           NotImplementedError

nengo_ocl/clra_gemv.py:847: NotImplementedError
============================================== pytest-warning summary ==============================================
WC1 None [pytest] section in setup.cfg files is deprecated, use [tool:pytest] instead.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=================================== 1 failed, 1 pytest-warnings in 2.14 seconds ====================================
hunse commented 8 years ago

This originates from the fact that in Python 2, np.zeros(1).strides is some crazy huge number, whereas in Python 3, it's what you'd expect (i.e. (8,), the length of one element, the same thing you'd get if you did np.zeros(2).strides). This crazy huge number becomes -1 when the strides are converted to int32, I think.

jgosmann commented 8 years ago

in Python 2, np.zeros(1).strides is some crazy huge number

In my Python 2.7.11 with NumPy 1.11.1 it's (8,) as expected.

hunse commented 8 years ago

Ooh, yeah, I had the dev version of Numpy installed. I just installed 1.11.1, and now I get (8,). Strange. Maybe that's a bug that needs to be filed with Numpy, if it's still there.

@tbekolay, you had this error, right? What Numpy version are you on?

tbekolay commented 8 years ago

I remember having a similar issue and finding out that NumPy does that specifically in dev versions so that people don't rely on this behavior? That might have been a different crazy huge number though. In any case, I had thought I was on 1.11 but apparently I was on a dev version also; checking out the v1.11.1 tag, the tests now all pass for me on Python 2!

hunse commented 8 years ago

So do you think this is something we should worry about? That seems quite weird if they intentionally do that in dev versions. But if that's true, it might be something worth addressing.

tbekolay commented 8 years ago

Not sure, I'll have to try to find the GH issue that explained it to refresh my memory... will report back when I find it!

xchoo commented 8 years ago

Hmm. Yeah, I've only encountered this problem with dev versions of Numpy. Can we make it a requirement for nengo_ocl to use a non-dev version of Numpy?

hunse commented 8 years ago

Can we make it a requirement for nengo_ocl to use a non-dev version of Numpy

Well it seems like right now that's an unofficial requirement. I'm not sure if it's possible to check for that in setup.py. But if this is something that Numpy regularly does in dev versions so that people don't rely on this feature, then I'd rather fix our code to not rely on that feature, if it's not too much work.

tbekolay commented 8 years ago

OK, found it. I haven't read these issues close enough to know the exact issue, but the thing to google around about is NPY_RELAXED_STRIDES_CHECKING; see https://github.com/numpy/numpy/issues/5085, https://github.com/numpy/numpy/issues/1673. In https://github.com/numpy/numpy/issues/5085, they make the change from the big integer to the -1 that we're seeing now.

hunse commented 8 years ago

So relaxed stride checking is set to become the default in 1.12: http://docs.scipy.org/doc/numpy/release.html. That means we should accommodate this, and not rely on what I thought should be the obvious behaviour.

seberg commented 8 years ago

Just stumbled on this by randomness. There are two things here:

  1. You get passed in a random array: Please make sure to not rely on nice strides, just check if the array is contiguous and if it is you only need the itemsize (you can infer the strides from shape+itemsize if you really want)
  2. If you create a new array, we won't set it to insane strides intentionally in release versions.

However, there I do not think there is any reason at all to rely on 2. If you use strides, you use strides together with the shape, if you do things like inferring a stride of a second dimension from the first, you should just use the itemsize + shape and the contiguity requirement.

hunse commented 8 years ago

This is fixed in PyOpenCL as of pyopencl/pyopencl#140.

To fix things here, we'll need to make sure we don't rely on the strides in any unitary dimensions (this should just require a few minor changes). We should probably check somewhere that if users are using Numpy with relaxed strides, they also have the newest PyOpenCL.