Open vollmersj opened 9 years ago
Unfortunately there is no better way because for obvious reason the gradients are stored separately. I'm quite curious though why do you want to get a flat huge vector? If you really want that, though, a easier way is to flatten each Param and then concat them all. The memory overhead cannot be avoided though.
Thank you for your response. There might be a way around the memory overhead by using pointers
x = zeros( 8 )
p = pointer_to_array( pointer( x, 3 ), (3,2) )
p[:,1] = 100.0
p[:,2] = 200.0
@show x # => [ 0.0, 0.0, 100.0, 100.0, 100.0, 200.0, 200.0, 200.0 ]
Initialising the layer blobs would then require picking an appropriate chunk out of the memory. Would this be possible?
Having one parameter vector makes it easier to try different tuning algorithms.
Yes, this is technically possible for could backend only. Though I doubt it will be a seriously issue because nowadays cpu memory is very large. If you have huge models, the bottleneck will then become the computation, esp when using a couple backend. For gpu backend, the memory is on the gpu device and cannot be directly shared with cpu.
For "cpu backend only", sorry I am on a phone and the auto correction is so crazy that it does not know cpu.
Mocha is really nice project and has backpropagation implemented for many different layer types and neurons. However what is the best way to interface it in way to obtain one large parameter vector? Is there a way around copy pieces to every blob? Currently, I am using
copy(net.states[i].parameters[j].blob,slice)
where slice is a slice of my big parameter vector.This can be put together to a function that does the backpropagation given an array
NNInds
containg the indicies of the corresponding slices.Copying the memory will produce an overhead, this should not matter for large networks, but there must be a better way.