hughperkins / clnn

OpenCL backend for Torch nn neural networks library
BSD 2-Clause "Simplified" License
126 stars 16 forks source link

MSECriterion batch size limitation #19

Closed petrjanda closed 8 years ago

petrjanda commented 8 years ago

Working on NN with MSECriterion I've stumbled upon the problem with variable batch size. One of the batches in my training set was shorter (dataset_size=7049, batch_size=50 => one batch was 49 items).

This would cause:

/Users/petr/torch/install/bin/luajit: ...s/petr/torch/install/share/lua/5.1/clnn/MSECriterion.lua:6: bad argument #1 to 'copy' (sizes do not match at /tmp/luarocks_cltorch-scm-1-27/cltorch/cltorch/src/lib/THClTensorCopy.cpp:136)
stack traceback:
        [C]: in function 'copy'
        ...s/petr/torch/install/share/lua/5.1/clnn/MSECriterion.lua:6: in function 'forward'
        ./util/trainer.lua:25: in function 'validateRegression'
        ./util/trainer.lua:136: in function 'train'
        main.lua:94: in main chunk
        [C]: in function 'dofile'
        ...petr/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x0103632ad0

As the criterion workBuffer is initialised from first batch the sizes ofc wouldn't match.

I've went around this by trimming my dataset although I was wondering if it might be worth considering to redesign the criterion to handle variable batch sizes.

I would be happy to help with it, just wanted to get some thoughts on it first.

petrjanda commented 8 years ago

Just for the record, the variable batch size worked fine on CPU, thus I would say it at least its worth considering an assert to highlight that limitation.

hughperkins commented 8 years ago

Ah, interesting. Ok, so this bit:

   if self.workBuffer == nil then
      self.workBuffer = input:clone()
   end

... will only run the first time, when workBuffer is nil. After that, workBuffer will never change size.

Probably want to add something after the end line here like:

self.workBuffer.resizeAs(input)

Do you want to try that, see if it fixes the problem? (Might need a bit of tweaking, to get it exactly right)

petrjanda commented 8 years ago

Let me try, seems pretty straightforward.

petrjanda commented 8 years ago

So I've opened PR with attempt of the unit test first. Is there any way to run this single unit test or a way to verify its ran as part of luajit -l clnn -e 'clnn.test()'?

hughperkins commented 8 years ago

So I've opened PR with attempt of the unit test first.

Ah awesome!

Is there any way to run this single unit test or a way to verify its ran as part of luajit -l clnn -e 'clnn.test()'?

Yes, you can run it like this:

luajit -l clnn -e 'clnn.test({"mse_variablebatchsize"})'

You'll need to build it first, ie:

luarocks make rocks/clnn-scm-1.rockspec
petrjanda commented 8 years ago

Thanks a lot!

Test & implementation done in #20