Add special mult-layer with custom initialization

mtewes commented 7 years ago

and whose weights are not optimized in the same "rythm" as the other layers (if they are optimized at all).

mtewes commented 7 years ago

My current plan:

Class MultNet, inherits from Net, and imposes the first hidden layer to be mult.
The current training.py remains unchanged, we use paramslice, it's already in place
Attempt to use a "buffer" [edit: "cache"] mecanism to speed up "running" through the first layer. Within minibatches, we typically would run 50+ times through this layer with exactly the same input -> recognize this and store the output.
a custom optfct could alternate between bfgs and custom-coded mult-layer tuning.

mtewes commented 7 years ago

Just for the record, if this cache is not too hard, maybe it could even speed up the sum-layers. The first sum layers also get plenty of calls with identical inputs and weights.

mtewes commented 7 years ago

This MultNet works nicely, with very little new arguments to pass around. Will soon edit the megalut wrapper as well. Now looking at this cache idea.

mtewes commented 7 years ago

Some profiling results, of 20 iterations of BFGS training with a "small" MultNet:

sorted by tottime:

         2150646 function calls (2127353 primitive calls) in 102.590 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     6360   73.745    0.012   98.616    0.016 layer.py:129(run)
    29495   21.752    0.001   21.752    0.001 {numpy.core.multiarray.array}
     3180    1.795    0.001    1.795    0.001 act.py:18(tanh)
     4892    0.826    0.000    0.826    0.000 {numpy.core._dotblas.dot}
     3180    0.421    0.000    0.421    0.000 {method 'prod' of 'numpy.ndarray' objects}
    17869    0.158    0.000    0.299    0.000 core.py:2745(_update_from)
   218390    0.138    0.000    0.144    0.000 {getattr}
        3    0.111    0.037    0.111    0.037 {posix.mkdir}
        4    0.095    0.024    0.095    0.024 {posix.open}
    78136    0.088    0.000    0.088    0.000 {range}
     1591    0.081    0.000    0.248    0.000 core.py:1060(__call__)
   174646    0.073    0.000    0.124    0.000 {isinstance}
     3180    0.069    0.000    0.389    0.000 core.py:912(__call__)
      118    0.067    0.001    0.067    0.001 {method 'close' of 'file' objects}
        1    0.052    0.052    0.052    0.052 position.py:20(<module>)
12009/4137    0.050    0.000    0.303    0.000 copy.py:145(deepcopy)

sorted by cumtime (it's another run, but identical settings):

         2150646 function calls (2127353 primitive calls) in 100.185 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.004    0.004  100.235  100.235 run_train.py:1(<module>)
    26/25    0.001    0.000   97.554    3.902 {eval}
        1    0.000    0.000   97.019   97.019 run.py:26(train)
  520/515    0.004    0.000   96.990    0.188 {map}
        1    0.000    0.000   96.909   96.909 run.py:120(_worker)
        1    0.000    0.000   96.896   96.896 ml.py:160(train)
        1    0.000    0.000   96.886   96.886 tenbilacwrapper.py:152(train)
        5    0.000    0.000   96.860   19.372 committee.py:130(call)
        5    0.000    0.000   96.860   19.372 committee.py:189(_worker)
        1    0.000    0.000   96.858   96.858 train.py:443(opt)
     1590    0.014    0.000   96.278    0.061 net.py:217(run)
     6360   71.392    0.011   96.265    0.015 layer.py:129(run)
        1    0.000    0.000   96.018   96.018 opt.py:35(multnetbfgs)
        1    0.000    0.000   96.018   96.018 opt.py:17(bfgs)
        1    0.004    0.004   95.957   95.957 optimize.py:390(fmin_bfgs)
     1566    0.018    0.000   94.673    0.060 train.py:350(cost)
  1587/46    0.004    0.000   94.576    2.056 optimize.py:174(function_wrapper)
       23    0.013    0.001   93.193    4.052 optimize.py:371(approx_fprime)
       20    0.000    0.000   90.970    4.548 linesearch.py:13(line_search_wolfe1)
       20    0.001    0.000   90.970    4.548 linesearch.py:85(scalar_search_wolfe1)
       22    0.000    0.000   89.640    4.075 linesearch.py:68(derphi)
     3180    0.019    0.000   22.034    0.007 fromnumeric.py:1931(prod)
     3181    0.016    0.000   22.015    0.007 fromnumeric.py:32(_wrapit)
    29495   21.648    0.001   21.648    0.001 {numpy.core.multiarray.array}
     4792    0.006    0.000   21.581    0.005 numeric.py:167(asarray)
        5    0.023    0.005    2.158    0.432 __init__.py:3(<module>)
        1    0.001    0.001    2.133    2.133 {execfile}
        1    0.002    0.002    2.132    2.132 config.py:4(<module>)
     3180    1.903    0.001    1.903    0.001 act.py:18(tanh)
        3    0.002    0.001    1.855    0.618 __init__.py:6(<module>)
       21    0.000    0.000    1.415    0.067 train.py:406(valcost)
       20    0.001    0.000    1.363    0.068 train.py:303(callback)
       22    0.000    0.000    1.329    0.060 linesearch.py:64(phi)
        2    0.003    0.002    1.205    0.603 table.py:3(<module>)
        8    0.016    0.002    1.088    0.136 __init__.py:2(<module>)
     1587    0.026    0.000    1.083    0.001 err.py:16(msb)
     4892    0.826    0.000    0.826    0.000 {numpy.core._dotblas.dot}
        1    0.000    0.000    0.810    0.810 run.py:139(predict)
        1    0.000    0.000    0.780    0.780 ml.py:268(predict)
     3176    0.007    0.000    0.751    0.000 fromnumeric.py:2299(mean)
        5    0.017    0.003    0.745    0.149 __init__.py:4(<module>)
     3176    0.025    0.000    0.745    0.000 core.py:4622(mean)
        1    0.000    0.000    0.727    0.727 train.py:286(end)
        1    0.000    0.000    0.707    0.707 tenbilacwrapper.py:264(predict)
        1    0.000    0.000    0.697    0.697 committee.py:86(call)
        1    0.000    0.000    0.694    0.694 net.py:243(predict)
        1    0.000    0.000    0.684    0.684 train.py:418(savebiasdetails)
        1    0.007    0.007    0.551    0.551 io.py:3(<module>)
        2    0.005    0.003    0.512    0.256 __init__.py:7(<module>)
       66    0.004    0.000    0.491    0.007 registry.py:76(_update__doc__)
        1    0.004    0.004    0.452    0.452 column.py:2(<module>)
        1    0.013    0.013    0.446    0.446 __init__.py:11(<module>)
     3180    0.422    0.000    0.422    0.000 {method 'prod' of 'numpy.ndarray' objects}
       24    0.000    0.000    0.393    0.016 core.py:857(__init__)
     3180    0.068    0.000    0.385    0.000 core.py:912(__call__)
        1    0.001    0.001    0.351    0.351 __init__.py:47(_check_numpy)
        3    0.004    0.001    0.350    0.117 __init__.py:10(<module>)
        2    0.001    0.001    0.306    0.153 __init__.py:9(<module>)
        7    0.002    0.000    0.305    0.044 introspection.py:86(minversion)
        1    0.000    0.000    0.305    0.305 numpycompat.py:5(<module>)
       66    0.012    0.000    0.304    0.005 registry.py:28(get_formats)
12009/4137    0.050    0.000    0.304    0.000 copy.py:145(deepcopy)

Conclusion, as expected: we need to optimise layer.run... all the rest is pointless at this stage.

mtewes commented 7 years ago

New profiling with cache, for exacly the same setup, sorted by total time:

         2283601 function calls (2260241 primitive calls) in 10.527 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     6632    3.334    0.001    6.695    0.001 layer.py:134(run)
     2262    1.277    0.001    1.277    0.001 act.py:18(tanh)
    27394    1.011    0.000    1.011    0.000 {numpy.core.multiarray.array}
     4020    0.624    0.000    0.624    0.000 {numpy.core._dotblas.dot}
    18617    0.165    0.000    0.312    0.000 core.py:2745(_update_from)
   222477    0.142    0.000    0.148    0.000 {getattr}
    20034    0.119    0.000    0.119    0.000 {method 'all' of 'numpy.ndarray' objects}
        3    0.110    0.037    0.110    0.037 {posix.mkdir}
     1659    0.084    0.000    0.256    0.000 core.py:1060(__call__)
   183152    0.078    0.000    0.128    0.000 {isinstance}
     3316    0.071    0.000    0.403    0.000 core.py:912(__call__)
    13492    0.054    0.000    0.054    0.000 {method 'copy' of 'numpy.ndarray' objects}
     2782    0.054    0.000    0.054    0.000 {method 'write' of 'file' objects}
    83468    0.051    0.000    0.051    0.000 {method 'update' of 'dict' objects}
        1    0.050    0.050    0.050    0.050 position.py:20(<module>)
12009/4137    0.050    0.000    0.303    0.000 copy.py:145(deepcopy)

Oh.

That's a factor 10.

:trophy:

Let's see if I missed something.

kuntzer commented 7 years ago

Nice!

mtewes commented 7 years ago

This is real, nice indeed. It might even significantly speed up the sum layers. I'll push my first implementation of the cache to the open pull request.

mtewes commented 7 years ago

Pushed -- it's so easy it hurts. Some more testing (and benchmarking) could still be done. Let's talk about this next week.

mtewes / tenbilac

Add special mult-layer with custom initialization #25