lisa-lab / pylearn2

Warning: This project does not have any current developer. See bellow.
BSD 3-Clause "New" or "Revised" License
2.76k stars 1.09k forks source link

Maxout for CPU gives Error, local_c01b.py giving localdot.py wrong arguments #465

Closed rasmuspjohansson closed 10 years ago

rasmuspjohansson commented 11 years ago

When I exchange the "MaxoutConvC01B" with"MaxoutLocalC01B Im assuming that my yaml files still will work. I do however get the following problems.

  1. MaxoutLocalC01B assumes a "max_filter_norm" argument while the MaxoutConvC01B class wants a "max_kernel_norm" argument

  2. Changing max_kernel_norm into max_filter_norm , lets me go a bit forward but I then get got an unexpected keyword argument 'kernel_stride' in : /linear/local_c01b.py", line 54, in init

Seems like local_c01b.py asuumes localDot.init can take kernel_stride as argument but this isnt implemented in Local dot

-when called from local_c01b.py LocalDot.init(self, filters=filters, irows=image_shape[0], icols=image_shape[1], kernel_stride=kernel_stride, padding_start=pad, message='')

-definition in localdot.py def init(self, filters, irows, icols=None, subsample=(1, 1), padding_start=None, filters_shape=None, message="")

nouiz commented 11 years ago

@goodfeli do you have an idea?

dwf commented 11 years ago

Sounds like these have gotten a bit out of sync, but I wouldn't expect those ConvC01B classes to work at all on the CPU, as they have no CPU implementations.

On Mon, Nov 18, 2013 at 7:15 AM, rasmuspjohansson notifications@github.comwrote:

When I exchange the "MaxoutConvC01B" with"MaxoutLocalC01B Im assuming that my yaml files still will work. I do however get the following problems.

  1. MaxoutLocalC01B assumes a "max_filter_norm" argument while the MaxoutConvC01B class wants a "max_kernel_norm" argument

  2. Changing max_kernel_norm into max_filter_norm , lets me go a bit forward but I then get got an unexpected keyword argument 'kernel_stride' in : /linear/local_c01b.py", line 54, in init

Seems like local_c01b.py asuumes localDot.init can take kernel_stride as argument but this isnt implemented in Local dot

-when called from local_c01b.py LocalDot.init(self, filters=filters, irows=image_shape[0], icols=image_shape[1], kernel_stride=kernel_stride, padding_start=pad, message='')

-definition in localdot.py def init(self, filters, irows, icols=None, subsample=(1, 1), padding_start=None, filters_shape=None, message="")

— Reply to this email directly or view it on GitHubhttps://github.com/lisa-lab/pylearn2/issues/465 .

rasmuspjohansson commented 11 years ago

In the "MaxoutLocalC01B" class in maxout.py it says "Unlike MaxoutConvC01B, this class supports operation on CPU, thanks to James Bergstra's TheanoLinear library, which pylearn2 has forked. The GPU code is still based on Alex Krizvhevsky's cuda_convnet library."

So it seems like the idea is that the MaxoutLocalC01B class should support cpu.

Sidestepping the mentioned problems by simply removing the "kernel_stride" argument(is this ok?) however gives me the error "File "/usr/local/lib/python2.7/dist-packages/pylearn2-0.1dev-py2.7.egg/pylearn2/config/yaml_parse.py", line 236, in try_to_import raise ImportError(base_msg + '. Original exception: '+str(e)) ImportError: Could not import pylearn2.models.maxout but could import pylearn2.models. Original exception: No module named pthreads" From wich I have no idea on how to move forward since I cant find any mentioning of any pthread module anywere. note: My plan with switching for the cpu version was partly to be able to train on non-square images, so the above error was caused when training on images of shape [28, 84]

rasmuspjohansson commented 11 years ago

I see that pthreads is needed by the pool.py, and that the location of the pthread library needs to be set by hand in pthread.py , is this the way to move forward?

lamblin commented 11 years ago

pthreads is an external library, that is usually (always?) installed on Unix systems, but requires installing on Windows. If you're using Linux, and pthreads is not installed, try to install it through your distribution's package manager. If you're using Windows, and had to install it by hand, or if it is installed in an unusual location, you can specify it with the following Theano configuration flag (see http://deeplearning.net/software/theano/library/config.html):

Please, do not edit pthread.py by hand to do that, but use the configuration mechanism.

nouiz commented 10 years ago

This seam not documented anywhere in pylearn2 doc.

@lamblin @goodfeli you know much more the current doc then me. Where should this be added?

@rasmuspjohansson can you confirm it fixed your problem?

rasmuspjohansson commented 10 years ago

The issue turned out to be that I wasnt using the latest code form github Updating to the latest code got me the pthreads.py module wich seem to have fixed the pthreads issue. There does however seem to be another issue with the cpu version of the maxout.py the maxout.py uses max_pool_c01b (the from pylearn2.sandbox.cuda_convnet.pool import max_pool_c01b) wich in its turn uses the MaxPool function wich only is implemented for GPU Is there an alternative to pool.py/max_pool_c01b I can use that works on cpu?

rasmuspjohansson commented 10 years ago

the error I get is that pool.py asumes square images

nouiz commented 10 years ago

I think that all think from cuda_convnet support only square images/filters. We wrote about that in another mailing list thread. It can be changed in the code. If someone do this we will accept the change, but I don't know someone that will do it. Are you interrested? If so, we can guide you.

Otherwise, there is convolution code in Theano that is slower on the GPU but it support non square image/filter.

Fred

On Wed, Dec 11, 2013 at 1:41 PM, rasmuspjohansson notifications@github.comwrote:

the error I get is that pool.py asumes square images

— Reply to this email directly or view it on GitHubhttps://github.com/lisa-lab/pylearn2/issues/465#issuecomment-30349216 .

rasmuspjohansson commented 10 years ago

I see, for some reason I assumed that the cuda_convnets cpu support also meant that it would support non-square images. Thanks for sorting that out. Still to unfamiliar with pylearn2 to dare to fix this myself and will make do with the square version for the time beeing. Thanks for all the help

2013/12/13 Frédéric Bastien notifications@github.com

I think that all think from cuda_convnet support only square images/filters. We wrote about that in another mailing list thread. It can be changed in the code. If someone do this we will accept the change, but I don't know someone that will do it. Are you interrested? If so, we can guide you.

Otherwise, there is convolution code in Theano that is slower on the GPU but it support non square image/filter.

Fred

On Wed, Dec 11, 2013 at 1:41 PM, rasmuspjohansson notifications@github.comwrote:

the error I get is that pool.py asumes square images

— Reply to this email directly or view it on GitHub< https://github.com/lisa-lab/pylearn2/issues/465#issuecomment-30349216> .

— Reply to this email directly or view it on GitHubhttps://github.com/lisa-lab/pylearn2/issues/465#issuecomment-30512686 .

bouthilx commented 10 years ago

@dwf What is the issue to fix in the end? Modify cuda_convnet to support non-square images? I am wrong or that means modifying Alex Krizhevsky's code?

dwf commented 10 years ago

It took me some reading and thinking but my basic feeling is:

So this isn't a really big ticket.

bouthilx commented 10 years ago

If I replace MaxoutConvC01B with MaxoutLocalC01B in this file https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/scripts/papers/maxout/mnist.yaml, I get the following error:

Traceback (most recent call last):
  File "pylearn2/scripts/train.py", line 209, in <module>
    train_obj.main_loop(time_budget=args.time_budget)
  File "/u/bouthilx/projects/pylearn2/pylearn2/train.py", line 155, in main_loop
    self.algorithm.setup(model=self.model, dataset=self.dataset)
  File "/u/bouthilx/projects/pylearn2/pylearn2/training_algorithms/sgd.py", line 262, in setup
    mode=self.monitor_iteration_mode)
  File "/u/bouthilx/projects/pylearn2/pylearn2/monitor.py", line 838, in setup
    model_channels = model.get_monitoring_channels(nested_ipt[-1])
  File "/u/bouthilx/projects/pylearn2/pylearn2/models/mlp.py", line 485, in get_monitoring_channels
    state = layer.fprop(state)
  File "/u/bouthilx/projects/pylearn2/pylearn2/models/maxout.py", line 1338, in fprop
    self.desired_space)
  File "/u/bouthilx/projects/pylearn2/pylearn2/space/__init__.py", line 430, in format_as
    space=space)
  File "/u/bouthilx/projects/pylearn2/pylearn2/space/__init__.py", line 465, in _format_as
    return self._format_as_impl(is_numeric, batch, space)
  File "/u/bouthilx/projects/pylearn2/pylearn2/space/__init__.py", line 1442, in _format_as_impl
    return _cast(result, space.dtype)
  File "/u/bouthilx/projects/pylearn2/pylearn2/space/__init__.py", line 201, in _cast
    raise TypeError("Unsupported arg type '%s'" % str(type(arg)))
TypeError: Unsupported arg type '<class 'theano.sandbox.cuda.var.CudaNdarrayVariable'>'

Is this a normal behavior? I didn't changed any arguments and it works with MaxoutConvC01B.

dwf commented 10 years ago

Definitely not normal behaviour.

bouthilx commented 10 years ago

@goodfeli Do you have any Idea why this happen?

goodfeli commented 10 years ago

Maybe something to do with the changes to spaces that @superlectric and @vdumoulin made recently?

goodfeli commented 10 years ago

It looks like @SuperElectric didn't support cuda ndarrays when implementing _cast. Casting a cuda ndarray to float32 should be a no-op, not an error.

https://github.com/lisa-lab/pylearn2/blame/master/pylearn2/space/__init__.py#L201

Please in general use git blame to see who wrote a function that's crashing and don't rely on me to answer every question. I'm graduating soon and you'll need to learn to fend for yourselves.

SuperElectric commented 10 years ago

I've submitted a PR that adds support for CudaNdarray to space._cast(): https://github.com/lisa-lab/pylearn2/pull/764

I've tested it by running scripts/papers/maxout/mnist.yaml with MaxoutConvC01B's replaced by MaxoutLocalC01B's, and their "max_kernel_norm" arguments replaced with "max_filter_norm". The "Unsupported arg type '<class 'theano.sandbox.cuda.var.CudaNdarrayVariable'>'" no longer happens.

I do get a ValueError complaining about wrongly shaped matrices, but this looks unrelated to _cast():

Traceback (most recent call last):
  File "/home/mifs/mkg30/projects/external/pylearn2/pylearn2/scripts/train.py", line 209, in <module>
    train_obj.main_loop(time_budget=args.time_budget)
  File "/home/mifs/mkg30/projects/external/pylearn2/pylearn2/train.py", line 193, in main_loop
    self.run_callbacks_and_monitoring()
  File "/home/mifs/mkg30/projects/external/pylearn2/pylearn2/train.py", line 238, in run_callbacks_and_monitoring
    self.model.monitor()
  File "/home/mifs/mkg30/projects/external/pylearn2/pylearn2/monitor.py", line 235, in __call__
    a(*X)
  File "/home/mifs/mkg30/projects/external/Theano/theano/compile/function_module.py", line 588, in __call__
    self.fn.thunks[self.fn.position_of_error])
  File "/home/mifs/mkg30/projects/external/Theano/theano/compile/function_module.py", line 579, in __call__
    outputs = self.fn()
  File "/home/mifs/mkg30/projects/external/Theano/theano/gof/op.py", line 644, in rval
    r = p(n, [x[0] for x in i], o)
  File "/home/mifs/mkg30/projects/external/pylearn2/pylearn2/packaged_dependencies/theano_linear/unshared_conv/unshared_conv.py", line 120, in perform
    str(left_arg.shape) + ' vs ' + str(right_arg.shape))
ValueError: matrices are not aligned: (96, 3072) vs (2688, 128)
Apply node that caused the error: FilterActs{module_stride=1}(Reshape{5}.0, W)
Inputs shapes: [(1, 48, 10, 10, 128), (9, 9, 48, 8, 8, 1, 96)]
Inputs strides: [(2457600, 51200, 5120, 512, 4), (10616832, 1179648, 24576, 3072, 384, 384, 4)]
Inputs types: [TensorType(float32, (True, False, False, False, False)), TensorType(float32, 7D)]
Use the Theano flag 'exception_verbosity=high' for a debugprint of this apply node.
bouthilx commented 10 years ago

MaxoutLocalC01B does not work with CPU at all actually. If you set device=cpu and force_device=True, it crashes because max_pool_c01b is not imported at line 48. If you set device=cpu and not force_device, you can see with nvidia-smi that the GPU is used.

Also, as it uses MaxPool from pylearn2.sandbox.cuda_convnet.pool, it is not possible to use non-square kernels. LocalDot from pylearn2.packaged_dependencies.theano_linear.unshared_conv.localdot.py also assumes a square image.

kernel_stride does not work with MaxoutLocalC01B, but it does work for MaxoutConvC01B. I tried both with identical parameters.

SuperElectric commented 10 years ago

(Mentioned this elsewhere, but also replying in this thread for the record):

I've added support for CUDA arrays in space._cast(). The PR is: https://github.com/lisa-lab/pylearn2/pull/764

This also fixes the issue mentioned earlier by Mehdi, who reported crashes when compute_test_value was set to raise.

On Fri, Mar 28, 2014 at 7:11 PM, Xavier Bouthillier < notifications@github.com> wrote:

MaxoutLocalC01B does not work with CPU at all actually. If you set device=cpu and force_device=True, it crashes because max_pool_c01b is not imported at line 48. If you set device=cpu and not force_device, you can see with nvidia-smi that the GPU is used.

Also, as it uses MaxPool from pylearn2.sandbox.cuda_convnet.pool, it is not possible to use non-square kernels. LocalDot from pylearn2.packaged_dependencies.theano_linear.unshared_conv.localdot.pyalso assumes a square image.

kernel_stride does not work with MaxoutLocalC01B, but it does work for MaxoutConvC01B. I tried both with identical parameters.

— Reply to this email directly or view it on GitHubhttps://github.com/lisa-lab/pylearn2/issues/465#issuecomment-38957219 .

bouthilx commented 10 years ago

I've made modifications to maxout.py to enable CPU use with MaxoutLocalC01B, but LocalDot needs square images and square kernels, so it is still not possible to use non-square images/kernels with MaxoutLocalC01B.

bouthilx commented 10 years ago

@dwf It is ready for review. However PR #764 will conflict with one of my modifications so you might wait for it to be merged and I'll update my PR.

SuperElectric commented 10 years ago

Please let me know if you need anything more from me for PR

764https://github.com/lisa-lab/pylearn2/pull/764 to

be merged. It passes all tests, but Ian had concerns about the fact that rebasing seems to have given the PR some patches that have already been merged in.

I will be traveling for the next 3 weeks, so I can't be super-responsive. I will do what I can.

-- Matt

On Mon, Mar 31, 2014 at 8:09 PM, Xavier Bouthillier < notifications@github.com> wrote:

@dwf https://github.com/dwf This ready for review. However PR #764https://github.com/lisa-lab/pylearn2/pull/764will conflict with one of my modification so you might wait for it to be merged and I'll update my PR.

— Reply to this email directly or view it on GitHubhttps://github.com/lisa-lab/pylearn2/issues/465#issuecomment-39128518 .

bouthilx commented 10 years ago

It happened to me a few weeks ago. You could start a new branch and cherry-pick the commits you made.

Xavier

On Tue, Apr 1, 2014 at 12:17 PM, mkg notifications@github.com wrote:

Please let me know if you need anything more from me for PR

764https://github.com/lisa-lab/pylearn2/pull/764 to

be merged. It passes all tests, but Ian had concerns about the fact that rebasing seems to have given the PR some patches that have already been merged in.

I will be traveling for the next 3 weeks, so I can't be super-responsive. I will do what I can.

-- Matt

On Mon, Mar 31, 2014 at 8:09 PM, Xavier Bouthillier < notifications@github.com> wrote:

@dwf https://github.com/dwf This ready for review. However PR #764< https://github.com/lisa-lab/pylearn2/pull/764>will conflict with one of my modification so you might wait for it to be merged and I'll update my PR.

Reply to this email directly or view it on GitHub< https://github.com/lisa-lab/pylearn2/issues/465#issuecomment-39128518> .

Reply to this email directly or view it on GitHubhttps://github.com/lisa-lab/pylearn2/issues/465#issuecomment-39225241 .