tbreloff / OnlineAI.jl

Machine learning for sequential/streaming data
Other
34 stars 10 forks source link

LSTM-g #4

Open tbreloff opened 8 years ago

tbreloff commented 8 years ago

I want to start playing around with hierarchical networks using LSTM-g nodes... let this issue be a discussion point for anyone that wants to collaborate on a pure-Julia implementation as part of OnlineAI.

paper: http://www.overcomplete.net/papers/nn2012.pdf

cc: @dmonner @MrMormon @cazala

0joshuaolson1 commented 8 years ago

Apologies for my ignorance of Julia, but kickstarting this, GatedLayer may need to be an abstract type in order to implement the three/seven types of layers. I think a concrete immutable would need mutable objects covering the data for the union of all layer types unless the specialization is within a field/array of another type such as abstract Unit.

For API and implementation guidance, besides my cute library, the quite readable Synaptic has (may need to search LSTM)

https://github.com/cazala/synaptic/blob/master/test/synaptic.js is a minimal test and https://github.com/cazala/synaptic/blob/master/dist/synaptic.js is the five above in one file but not minimized (plus irrelevant Bower, NPM etc. from https://github.com/cazala/synaptic/blob/master/src/synaptic.js).

tbreloff commented 8 years ago

Thanks for the thoughts/links.

When you say "3/7 types of layers", what exactly do you mean? My understanding (after a few readings of the paper) is that every LSTM component can be represented by the same type of layer with common math, differentiated solely by the connectivity structure (this is why I'm interested btw). So a memory cell is a memory cell because of the self-connection and the gating of its inward and outward connections. It has the exact same math as a forget gate, but its "G" is the empty set.

@dmonner Am I missing something?

On Feb 13, 2016, at 2:06 PM, Joshua Olson notifications@github.com wrote:

Apologies for my ignorance of Julia, but kickstarting this, GatedLayer may need to be an abstract type in order to implement the three/seven types of layers. I think a concrete immutable would need mutable objects covering the data for the union of all layer types unless the specialization is within a field/array of another type such as abstract Unit.

For API and implementation guidance, besides my cute library, the quite readable Synaptic has (may need to search LSTM)

math: https://github.com/cazala/synaptic/blob/master/src/neuron.js network builder and optimizer: https://github.com/cazala/synaptic/blob/master/src/network.js layer abstraction on top of math/network: https://github.com/cazala/synaptic/blob/master/src/layer.js grammar, XOR, DSR etc. trainer in https://github.com/cazala/synaptic/blob/master/src/trainer.js basic use of network/layer/trainer in https://github.com/cazala/synaptic/blob/master/src/architect.js https://github.com/cazala/synaptic/blob/master/test/synaptic.js is a minimal test and https://github.com/cazala/synaptic/blob/master/dist/synaptic.js is the five above in one file but not minimized (plus irrelevant Bower, NPM etc. from https://github.com/cazala/synaptic/blob/master/src/synaptic.js).

— Reply to this email directly or view it on GitHub.

0joshuaolson1 commented 8 years ago

Sorry for the confusion. You're mostly right, but inputs, outputs, and bias units/connections do need special handling. Storing everything in terms of the specific layers (instead of having to deduce the memory cell layer(s) by their self-connected memory units) complicates things a bit but is good for performance and keeping the architectures LSTM-like.

tbreloff commented 8 years ago

but inputs, outputs, and bias units/connections do need special handling

I'm not convinced yet. An input layer is a GatedLayer with connections to the (LSTM terminology) output layer, memory cell, and input/forget/output gate layers. An output layer is a GatedLayer with connections from input layer and memory cell. Some of those connections happen to be gated, but that doesn't change anything in terms of what the layer "is".

As for bias, I was expecting that to be a core part of the layer, possibly parameterizing the type if needed:

@enum BiasType BIAS NOBIAS
immutable GatedLayer{B <: BiasType}
    ...
end

I'm not 100% on the bias yet... I need to get my hands more dirty with code before I decide.

complicates things a bit

Actually I think it's the opposite. Having one layer type with consistent math makes everything super simple to reason about, and also doesn't in any way lock one into the LSTM model. For those that want cookie-cutter LSTM, there would be a constructor lstm_layer(...) = ... or something like that, which would create the sublayers with the appropriate connections/gates. I could see having a layer_tag or something like that, which would allow you to tag certain layers as memory cell, etc for visualizations or quick access. Nothing here implies we would need different types, as all layers would need the same functionality.

0joshuaolson1 commented 8 years ago

Your bias idea sounds good. What I meant by layers complicating things is separating the layers by what they do, and you're right that most separation is only relevant to building the network. If you want to do memory unit deduction in the optimization pass, there's plenty of ways to store that info I guess.

Inputs and outputs are special only because you write input directly to activations and need to know which units determine output and error. That's all, and Julia's iterators can probably handle that part well enough. Iterating over whatever associates tags with units may have some indirection overhead, but that's just my premature optimizer talking.

tbreloff commented 8 years ago

I've been mulling this over for a couple days. I have a good framework of types/structure without the forward/backward code, but I've been re-deriving the core math from a slightly more generalized perspective, with the hope that my implementation is less "gated connections and memory cells", and more "optionally recurrent layers with multiplicative connections". The hope is that the math is slightly simplified and that there are more network architectures which can be built naturally.

If I come up with something good, I may throw together a blog post about it. For now, just be patient.

0joshuaolson1 commented 8 years ago

Thanks for sharing your progress. Do you have multiply-gated connections in mind?