mtewes / tenbilac

Neural network for inverse regression problems
1 stars 2 forks source link

Adding "Product Unit" neurons (or layers) #14

Closed mtewes closed 7 years ago

mtewes commented 7 years ago

There is quite a bit of litterature on multiplicative networks, and it sounds very good so far, nice! One paper I like a lot is: http://sci2s.ugr.es/keel/pdf/specific/articulo/Schmidtt%20on-the-complexity-of.pdf

Will work on this in a dedicated branch #14

mtewes commented 7 years ago

According to Leerink et al. (1995) page 2, the initialization is important and should not be done with small values around 0.0, and local minima get worse. Let's see.

mtewes commented 7 years ago

Python + numpy implementations of the product unit layer are done, but they are probably much slower than the equivalent "sum" layers. We might soon need SWIG here. Now writing a simple test that "learns" the multiplication of 2 numbers...

mtewes commented 7 years ago

Nets can now include product layers, training remains unchanged so far. [-3,3] means the first hidden layer is composed of 3 product units. Now working on that test...

mtewes commented 7 years ago

Learning z = x * y with 1000 examples. The prediction error is shown as color.

Note the smaller color scale of the latter. Seems to work, and we can also confirm that sum-units are badly suited for multiplications.

The "report" is also automatically adapted, after training it looks like

[2|1/*iden|1/iden=5]
Layer 'h0', mode mult, ni 2, nn 1, actfct iden:
    output 0 = iden ( prod (input ** [ 0.99999963  0.99999962]) + 0.134301267119 )
Layer 'o', mode sum, ni 1, nn 1, actfct iden:
    output 0 = iden ( input * [ 0.99999999] + -0.134301381374 )
kuntzer commented 7 years ago

That does look nice!

mtewes commented 7 years ago

I pushed the demo code, demo/test_mult_learn.py

One big issue to think about is the "normalization", the issues with negative values fed into "product units", and the choice of activation functions that get "fed" into later product-unit layers. It's a whole new world to explore, but I would love to not explore it for too long :) I guess it would be good to keep the "signs" of ellipticity components somehow, i.e. use a "-11" norm, and find a hack so that it works with the product units. Using a "01" normer in front of a product unit would be a shame. We precisely want those units to be able to rescale numbers which might be positive or negative by multiplying their amplitude. Maybe some power-raising that artificially keeps the sign could work (and reinventing maths is fun).

mtewes commented 7 years ago

Routes I'm trying:

We want to be able to consider negative inputs because the non-linear characteristics of product units, which we want to use computationally, are centered on the origin.

Approaches:

1) A bit of a hack: make the product units always return a value with the same sign as their first input, using np.sign. All the exponentiation and product stuff works on np.fabs(inputs). This makes their behavior "odd" around 0.0 of the first input. But it's a bad idea, as ouputs from all neurons will have the same sign (i.e., that of the first input). This direction can only be explored when mixing layers with sum and prod neurons (so that the sum neurons can carry around non-sign-polluted information. We need the possibility to pass a simple identity to the next layer, for some neurons.

2) Using the product of the sign of all inputs is not a good idea, as useless noisy inputs (which would get zero power) would still mess up. Somehow only the signs of inputs with "significant weight" should matter. Maybe this can be coded. [edit: did this! This will now need some "identity" initial settings]

mtewes commented 7 years ago

Todo in order to test the point 2. above:

Todo next:

mtewes commented 7 years ago

About the initial "identity" settings, there is another interesting aspect: so far, for a network with a single output, each layer was only set to "transport" the first input, using its first neuron. All other neurons started from zero. It could be much more interesting for each i-th neuron of each layer to transport the i-th input. Given that we train layers starting by the end, the last layer would directly see "all inputs". Of course, if you have more hidden nodes than inputs, you'll still have some "joker" neurons starting from zero. Will implement this in a configurable way, to leave the choice to "transport" only the first n inputs or to transport all inputs as far as possible into the network.

mtewes commented 7 years ago

About initial noise in the "weights" (== exponents) of mult-layers:

mtewes commented 7 years ago

I'll experiment with restricting the multiplication-layer to positive powers, i.e., avoid divisions.

mtewes commented 7 years ago

Closing this, it works. Next upgrades will be done in a new issue and branch.