Closed rasmuspjohansson closed 10 years ago
@goodfeli do you have an idea?
Sounds like these have gotten a bit out of sync, but I wouldn't expect those ConvC01B classes to work at all on the CPU, as they have no CPU implementations.
On Mon, Nov 18, 2013 at 7:15 AM, rasmuspjohansson notifications@github.comwrote:
When I exchange the "MaxoutConvC01B" with"MaxoutLocalC01B Im assuming that my yaml files still will work. I do however get the following problems.
MaxoutLocalC01B assumes a "max_filter_norm" argument while the MaxoutConvC01B class wants a "max_kernel_norm" argument
Changing max_kernel_norm into max_filter_norm , lets me go a bit forward but I then get got an unexpected keyword argument 'kernel_stride' in : /linear/local_c01b.py", line 54, in init
Seems like local_c01b.py asuumes localDot.init can take kernel_stride as argument but this isnt implemented in Local dot
-when called from local_c01b.py LocalDot.init(self, filters=filters, irows=image_shape[0], icols=image_shape[1], kernel_stride=kernel_stride, padding_start=pad, message='')
-definition in localdot.py def init(self, filters, irows, icols=None, subsample=(1, 1), padding_start=None, filters_shape=None, message="")
— Reply to this email directly or view it on GitHubhttps://github.com/lisa-lab/pylearn2/issues/465 .
In the "MaxoutLocalC01B" class in maxout.py it says "Unlike MaxoutConvC01B, this class supports operation on CPU, thanks to James Bergstra's TheanoLinear library, which pylearn2 has forked. The GPU code is still based on Alex Krizvhevsky's cuda_convnet library."
So it seems like the idea is that the MaxoutLocalC01B class should support cpu.
Sidestepping the mentioned problems by simply removing the "kernel_stride" argument(is this ok?) however gives me the error "File "/usr/local/lib/python2.7/dist-packages/pylearn2-0.1dev-py2.7.egg/pylearn2/config/yaml_parse.py", line 236, in try_to_import raise ImportError(base_msg + '. Original exception: '+str(e)) ImportError: Could not import pylearn2.models.maxout but could import pylearn2.models. Original exception: No module named pthreads" From wich I have no idea on how to move forward since I cant find any mentioning of any pthread module anywere. note: My plan with switching for the cpu version was partly to be able to train on non-square images, so the above error was caused when training on images of shape [28, 84]
I see that pthreads is needed by the pool.py, and that the location of the pthread library needs to be set by hand in pthread.py , is this the way to move forward?
pthreads is an external library, that is usually (always?) installed on Unix systems, but requires installing on Windows. If you're using Linux, and pthreads is not installed, try to install it through your distribution's package manager. If you're using Windows, and had to install it by hand, or if it is installed in an unusual location, you can specify it with the following Theano configuration flag (see http://deeplearning.net/software/theano/library/config.html):
pthreads.inc_dir
: location of pthread.hpthreads.lib_dir
: location of library implementing pthreadspthreads.lib
: name of the library that implements pthreads (e.g. "pthreadVC2" if using pthreadVC2.dll/.lib from pthreads-win32)'Please, do not edit pthread.py by hand to do that, but use the configuration mechanism.
This seam not documented anywhere in pylearn2 doc.
@lamblin @goodfeli you know much more the current doc then me. Where should this be added?
@rasmuspjohansson can you confirm it fixed your problem?
The issue turned out to be that I wasnt using the latest code form github Updating to the latest code got me the pthreads.py module wich seem to have fixed the pthreads issue. There does however seem to be another issue with the cpu version of the maxout.py the maxout.py uses max_pool_c01b (the from pylearn2.sandbox.cuda_convnet.pool import max_pool_c01b) wich in its turn uses the MaxPool function wich only is implemented for GPU Is there an alternative to pool.py/max_pool_c01b I can use that works on cpu?
the error I get is that pool.py asumes square images
I think that all think from cuda_convnet support only square images/filters. We wrote about that in another mailing list thread. It can be changed in the code. If someone do this we will accept the change, but I don't know someone that will do it. Are you interrested? If so, we can guide you.
Otherwise, there is convolution code in Theano that is slower on the GPU but it support non square image/filter.
Fred
On Wed, Dec 11, 2013 at 1:41 PM, rasmuspjohansson notifications@github.comwrote:
the error I get is that pool.py asumes square images
— Reply to this email directly or view it on GitHubhttps://github.com/lisa-lab/pylearn2/issues/465#issuecomment-30349216 .
I see, for some reason I assumed that the cuda_convnets cpu support also meant that it would support non-square images. Thanks for sorting that out. Still to unfamiliar with pylearn2 to dare to fix this myself and will make do with the square version for the time beeing. Thanks for all the help
2013/12/13 Frédéric Bastien notifications@github.com
I think that all think from cuda_convnet support only square images/filters. We wrote about that in another mailing list thread. It can be changed in the code. If someone do this we will accept the change, but I don't know someone that will do it. Are you interrested? If so, we can guide you.
Otherwise, there is convolution code in Theano that is slower on the GPU but it support non square image/filter.
Fred
On Wed, Dec 11, 2013 at 1:41 PM, rasmuspjohansson notifications@github.comwrote:
the error I get is that pool.py asumes square images
— Reply to this email directly or view it on GitHub< https://github.com/lisa-lab/pylearn2/issues/465#issuecomment-30349216> .
— Reply to this email directly or view it on GitHubhttps://github.com/lisa-lab/pylearn2/issues/465#issuecomment-30512686 .
@dwf What is the issue to fix in the end? Modify cuda_convnet to support non-square images? I am wrong or that means modifying Alex Krizhevsky's code?
It took me some reading and thinking but my basic feeling is:
kernel_stride
on CPU, and for non-square kernels on GPUSo this isn't a really big ticket.
If I replace MaxoutConvC01B
with MaxoutLocalC01B
in this file https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/scripts/papers/maxout/mnist.yaml, I get the following error:
Traceback (most recent call last):
File "pylearn2/scripts/train.py", line 209, in <module>
train_obj.main_loop(time_budget=args.time_budget)
File "/u/bouthilx/projects/pylearn2/pylearn2/train.py", line 155, in main_loop
self.algorithm.setup(model=self.model, dataset=self.dataset)
File "/u/bouthilx/projects/pylearn2/pylearn2/training_algorithms/sgd.py", line 262, in setup
mode=self.monitor_iteration_mode)
File "/u/bouthilx/projects/pylearn2/pylearn2/monitor.py", line 838, in setup
model_channels = model.get_monitoring_channels(nested_ipt[-1])
File "/u/bouthilx/projects/pylearn2/pylearn2/models/mlp.py", line 485, in get_monitoring_channels
state = layer.fprop(state)
File "/u/bouthilx/projects/pylearn2/pylearn2/models/maxout.py", line 1338, in fprop
self.desired_space)
File "/u/bouthilx/projects/pylearn2/pylearn2/space/__init__.py", line 430, in format_as
space=space)
File "/u/bouthilx/projects/pylearn2/pylearn2/space/__init__.py", line 465, in _format_as
return self._format_as_impl(is_numeric, batch, space)
File "/u/bouthilx/projects/pylearn2/pylearn2/space/__init__.py", line 1442, in _format_as_impl
return _cast(result, space.dtype)
File "/u/bouthilx/projects/pylearn2/pylearn2/space/__init__.py", line 201, in _cast
raise TypeError("Unsupported arg type '%s'" % str(type(arg)))
TypeError: Unsupported arg type '<class 'theano.sandbox.cuda.var.CudaNdarrayVariable'>'
Is this a normal behavior? I didn't changed any arguments and it works with MaxoutConvC01B
.
Definitely not normal behaviour.
@goodfeli Do you have any Idea why this happen?
Maybe something to do with the changes to spaces that @superlectric and @vdumoulin made recently?
It looks like @SuperElectric didn't support cuda ndarrays when implementing _cast. Casting a cuda ndarray to float32 should be a no-op, not an error.
https://github.com/lisa-lab/pylearn2/blame/master/pylearn2/space/__init__.py#L201
Please in general use git blame to see who wrote a function that's crashing and don't rely on me to answer every question. I'm graduating soon and you'll need to learn to fend for yourselves.
I've submitted a PR that adds support for CudaNdarray to space._cast(): https://github.com/lisa-lab/pylearn2/pull/764
I've tested it by running scripts/papers/maxout/mnist.yaml with MaxoutConvC01B's replaced by MaxoutLocalC01B's, and their "max_kernel_norm" arguments replaced with "max_filter_norm". The "Unsupported arg type '<class 'theano.sandbox.cuda.var.CudaNdarrayVariable'>'" no longer happens.
I do get a ValueError complaining about wrongly shaped matrices, but this looks unrelated to _cast():
Traceback (most recent call last):
File "/home/mifs/mkg30/projects/external/pylearn2/pylearn2/scripts/train.py", line 209, in <module>
train_obj.main_loop(time_budget=args.time_budget)
File "/home/mifs/mkg30/projects/external/pylearn2/pylearn2/train.py", line 193, in main_loop
self.run_callbacks_and_monitoring()
File "/home/mifs/mkg30/projects/external/pylearn2/pylearn2/train.py", line 238, in run_callbacks_and_monitoring
self.model.monitor()
File "/home/mifs/mkg30/projects/external/pylearn2/pylearn2/monitor.py", line 235, in __call__
a(*X)
File "/home/mifs/mkg30/projects/external/Theano/theano/compile/function_module.py", line 588, in __call__
self.fn.thunks[self.fn.position_of_error])
File "/home/mifs/mkg30/projects/external/Theano/theano/compile/function_module.py", line 579, in __call__
outputs = self.fn()
File "/home/mifs/mkg30/projects/external/Theano/theano/gof/op.py", line 644, in rval
r = p(n, [x[0] for x in i], o)
File "/home/mifs/mkg30/projects/external/pylearn2/pylearn2/packaged_dependencies/theano_linear/unshared_conv/unshared_conv.py", line 120, in perform
str(left_arg.shape) + ' vs ' + str(right_arg.shape))
ValueError: matrices are not aligned: (96, 3072) vs (2688, 128)
Apply node that caused the error: FilterActs{module_stride=1}(Reshape{5}.0, W)
Inputs shapes: [(1, 48, 10, 10, 128), (9, 9, 48, 8, 8, 1, 96)]
Inputs strides: [(2457600, 51200, 5120, 512, 4), (10616832, 1179648, 24576, 3072, 384, 384, 4)]
Inputs types: [TensorType(float32, (True, False, False, False, False)), TensorType(float32, 7D)]
Use the Theano flag 'exception_verbosity=high' for a debugprint of this apply node.
MaxoutLocalC01B
does not work with CPU at all actually. If you set device=cpu and force_device=True, it crashes because max_pool_c01b is not imported at line 48. If you set device=cpu and not force_device, you can see with nvidia-smi that the GPU is used.
Also, as it uses MaxPool from pylearn2.sandbox.cuda_convnet.pool, it is not possible to use non-square kernels. LocalDot from pylearn2.packaged_dependencies.theano_linear.unshared_conv.localdot.py also assumes a square image.
kernel_stride
does not work with MaxoutLocalC01B
, but it does work for MaxoutConvC01B
. I tried both with identical parameters.
(Mentioned this elsewhere, but also replying in this thread for the record):
I've added support for CUDA arrays in space._cast(). The PR is: https://github.com/lisa-lab/pylearn2/pull/764
This also fixes the issue mentioned earlier by Mehdi, who reported crashes when compute_test_value was set to raise.
On Fri, Mar 28, 2014 at 7:11 PM, Xavier Bouthillier < notifications@github.com> wrote:
MaxoutLocalC01B does not work with CPU at all actually. If you set device=cpu and force_device=True, it crashes because max_pool_c01b is not imported at line 48. If you set device=cpu and not force_device, you can see with nvidia-smi that the GPU is used.
Also, as it uses MaxPool from pylearn2.sandbox.cuda_convnet.pool, it is not possible to use non-square kernels. LocalDot from pylearn2.packaged_dependencies.theano_linear.unshared_conv.localdot.pyalso assumes a square image.
kernel_stride does not work with MaxoutLocalC01B, but it does work for MaxoutConvC01B. I tried both with identical parameters.
— Reply to this email directly or view it on GitHubhttps://github.com/lisa-lab/pylearn2/issues/465#issuecomment-38957219 .
I've made modifications to maxout.py to enable CPU use with MaxoutLocalC01B
, but LocalDot needs square images and square kernels, so it is still not possible to use non-square images/kernels with MaxoutLocalC01B
.
@dwf It is ready for review. However PR #764 will conflict with one of my modifications so you might wait for it to be merged and I'll update my PR.
Please let me know if you need anything more from me for PR
be merged. It passes all tests, but Ian had concerns about the fact that rebasing seems to have given the PR some patches that have already been merged in.
I will be traveling for the next 3 weeks, so I can't be super-responsive. I will do what I can.
-- Matt
On Mon, Mar 31, 2014 at 8:09 PM, Xavier Bouthillier < notifications@github.com> wrote:
@dwf https://github.com/dwf This ready for review. However PR #764https://github.com/lisa-lab/pylearn2/pull/764will conflict with one of my modification so you might wait for it to be merged and I'll update my PR.
— Reply to this email directly or view it on GitHubhttps://github.com/lisa-lab/pylearn2/issues/465#issuecomment-39128518 .
It happened to me a few weeks ago. You could start a new branch and cherry-pick the commits you made.
Xavier
On Tue, Apr 1, 2014 at 12:17 PM, mkg notifications@github.com wrote:
Please let me know if you need anything more from me for PR
764https://github.com/lisa-lab/pylearn2/pull/764 to
be merged. It passes all tests, but Ian had concerns about the fact that rebasing seems to have given the PR some patches that have already been merged in.
I will be traveling for the next 3 weeks, so I can't be super-responsive. I will do what I can.
-- Matt
On Mon, Mar 31, 2014 at 8:09 PM, Xavier Bouthillier < notifications@github.com> wrote:
@dwf https://github.com/dwf This ready for review. However PR #764< https://github.com/lisa-lab/pylearn2/pull/764>will conflict with one of my modification so you might wait for it to be merged and I'll update my PR.
Reply to this email directly or view it on GitHub< https://github.com/lisa-lab/pylearn2/issues/465#issuecomment-39128518> .
Reply to this email directly or view it on GitHubhttps://github.com/lisa-lab/pylearn2/issues/465#issuecomment-39225241 .
When I exchange the "MaxoutConvC01B" with"MaxoutLocalC01B Im assuming that my yaml files still will work. I do however get the following problems.
MaxoutLocalC01B assumes a "max_filter_norm" argument while the MaxoutConvC01B class wants a "max_kernel_norm" argument
Changing max_kernel_norm into max_filter_norm , lets me go a bit forward but I then get got an unexpected keyword argument 'kernel_stride' in : /linear/local_c01b.py", line 54, in init
Seems like local_c01b.py asuumes localDot.init can take kernel_stride as argument but this isnt implemented in Local dot
-when called from local_c01b.py LocalDot.init(self, filters=filters, irows=image_shape[0], icols=image_shape[1], kernel_stride=kernel_stride, padding_start=pad, message='')
-definition in localdot.py def init(self, filters, irows, icols=None, subsample=(1, 1), padding_start=None, filters_shape=None, message="")