Closed mtewes closed 7 years ago
My current plan:
This MultNet works nicely, with very little new arguments to pass around. Will soon edit the megalut wrapper as well. Now looking at this cache idea.
Some profiling results, of 20 iterations of BFGS training with a "small" MultNet:
2150646 function calls (2127353 primitive calls) in 102.590 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
6360 73.745 0.012 98.616 0.016 layer.py:129(run)
29495 21.752 0.001 21.752 0.001 {numpy.core.multiarray.array}
3180 1.795 0.001 1.795 0.001 act.py:18(tanh)
4892 0.826 0.000 0.826 0.000 {numpy.core._dotblas.dot}
3180 0.421 0.000 0.421 0.000 {method 'prod' of 'numpy.ndarray' objects}
17869 0.158 0.000 0.299 0.000 core.py:2745(_update_from)
218390 0.138 0.000 0.144 0.000 {getattr}
3 0.111 0.037 0.111 0.037 {posix.mkdir}
4 0.095 0.024 0.095 0.024 {posix.open}
78136 0.088 0.000 0.088 0.000 {range}
1591 0.081 0.000 0.248 0.000 core.py:1060(__call__)
174646 0.073 0.000 0.124 0.000 {isinstance}
3180 0.069 0.000 0.389 0.000 core.py:912(__call__)
118 0.067 0.001 0.067 0.001 {method 'close' of 'file' objects}
1 0.052 0.052 0.052 0.052 position.py:20(<module>)
12009/4137 0.050 0.000 0.303 0.000 copy.py:145(deepcopy)
2150646 function calls (2127353 primitive calls) in 100.185 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.004 0.004 100.235 100.235 run_train.py:1(<module>)
26/25 0.001 0.000 97.554 3.902 {eval}
1 0.000 0.000 97.019 97.019 run.py:26(train)
520/515 0.004 0.000 96.990 0.188 {map}
1 0.000 0.000 96.909 96.909 run.py:120(_worker)
1 0.000 0.000 96.896 96.896 ml.py:160(train)
1 0.000 0.000 96.886 96.886 tenbilacwrapper.py:152(train)
5 0.000 0.000 96.860 19.372 committee.py:130(call)
5 0.000 0.000 96.860 19.372 committee.py:189(_worker)
1 0.000 0.000 96.858 96.858 train.py:443(opt)
1590 0.014 0.000 96.278 0.061 net.py:217(run)
6360 71.392 0.011 96.265 0.015 layer.py:129(run)
1 0.000 0.000 96.018 96.018 opt.py:35(multnetbfgs)
1 0.000 0.000 96.018 96.018 opt.py:17(bfgs)
1 0.004 0.004 95.957 95.957 optimize.py:390(fmin_bfgs)
1566 0.018 0.000 94.673 0.060 train.py:350(cost)
1587/46 0.004 0.000 94.576 2.056 optimize.py:174(function_wrapper)
23 0.013 0.001 93.193 4.052 optimize.py:371(approx_fprime)
20 0.000 0.000 90.970 4.548 linesearch.py:13(line_search_wolfe1)
20 0.001 0.000 90.970 4.548 linesearch.py:85(scalar_search_wolfe1)
22 0.000 0.000 89.640 4.075 linesearch.py:68(derphi)
3180 0.019 0.000 22.034 0.007 fromnumeric.py:1931(prod)
3181 0.016 0.000 22.015 0.007 fromnumeric.py:32(_wrapit)
29495 21.648 0.001 21.648 0.001 {numpy.core.multiarray.array}
4792 0.006 0.000 21.581 0.005 numeric.py:167(asarray)
5 0.023 0.005 2.158 0.432 __init__.py:3(<module>)
1 0.001 0.001 2.133 2.133 {execfile}
1 0.002 0.002 2.132 2.132 config.py:4(<module>)
3180 1.903 0.001 1.903 0.001 act.py:18(tanh)
3 0.002 0.001 1.855 0.618 __init__.py:6(<module>)
21 0.000 0.000 1.415 0.067 train.py:406(valcost)
20 0.001 0.000 1.363 0.068 train.py:303(callback)
22 0.000 0.000 1.329 0.060 linesearch.py:64(phi)
2 0.003 0.002 1.205 0.603 table.py:3(<module>)
8 0.016 0.002 1.088 0.136 __init__.py:2(<module>)
1587 0.026 0.000 1.083 0.001 err.py:16(msb)
4892 0.826 0.000 0.826 0.000 {numpy.core._dotblas.dot}
1 0.000 0.000 0.810 0.810 run.py:139(predict)
1 0.000 0.000 0.780 0.780 ml.py:268(predict)
3176 0.007 0.000 0.751 0.000 fromnumeric.py:2299(mean)
5 0.017 0.003 0.745 0.149 __init__.py:4(<module>)
3176 0.025 0.000 0.745 0.000 core.py:4622(mean)
1 0.000 0.000 0.727 0.727 train.py:286(end)
1 0.000 0.000 0.707 0.707 tenbilacwrapper.py:264(predict)
1 0.000 0.000 0.697 0.697 committee.py:86(call)
1 0.000 0.000 0.694 0.694 net.py:243(predict)
1 0.000 0.000 0.684 0.684 train.py:418(savebiasdetails)
1 0.007 0.007 0.551 0.551 io.py:3(<module>)
2 0.005 0.003 0.512 0.256 __init__.py:7(<module>)
66 0.004 0.000 0.491 0.007 registry.py:76(_update__doc__)
1 0.004 0.004 0.452 0.452 column.py:2(<module>)
1 0.013 0.013 0.446 0.446 __init__.py:11(<module>)
3180 0.422 0.000 0.422 0.000 {method 'prod' of 'numpy.ndarray' objects}
24 0.000 0.000 0.393 0.016 core.py:857(__init__)
3180 0.068 0.000 0.385 0.000 core.py:912(__call__)
1 0.001 0.001 0.351 0.351 __init__.py:47(_check_numpy)
3 0.004 0.001 0.350 0.117 __init__.py:10(<module>)
2 0.001 0.001 0.306 0.153 __init__.py:9(<module>)
7 0.002 0.000 0.305 0.044 introspection.py:86(minversion)
1 0.000 0.000 0.305 0.305 numpycompat.py:5(<module>)
66 0.012 0.000 0.304 0.005 registry.py:28(get_formats)
12009/4137 0.050 0.000 0.304 0.000 copy.py:145(deepcopy)
Conclusion, as expected: we need to optimise layer.run... all the rest is pointless at this stage.
New profiling with cache, for exacly the same setup, sorted by total time:
2283601 function calls (2260241 primitive calls) in 10.527 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
6632 3.334 0.001 6.695 0.001 layer.py:134(run)
2262 1.277 0.001 1.277 0.001 act.py:18(tanh)
27394 1.011 0.000 1.011 0.000 {numpy.core.multiarray.array}
4020 0.624 0.000 0.624 0.000 {numpy.core._dotblas.dot}
18617 0.165 0.000 0.312 0.000 core.py:2745(_update_from)
222477 0.142 0.000 0.148 0.000 {getattr}
20034 0.119 0.000 0.119 0.000 {method 'all' of 'numpy.ndarray' objects}
3 0.110 0.037 0.110 0.037 {posix.mkdir}
1659 0.084 0.000 0.256 0.000 core.py:1060(__call__)
183152 0.078 0.000 0.128 0.000 {isinstance}
3316 0.071 0.000 0.403 0.000 core.py:912(__call__)
13492 0.054 0.000 0.054 0.000 {method 'copy' of 'numpy.ndarray' objects}
2782 0.054 0.000 0.054 0.000 {method 'write' of 'file' objects}
83468 0.051 0.000 0.051 0.000 {method 'update' of 'dict' objects}
1 0.050 0.050 0.050 0.050 position.py:20(<module>)
12009/4137 0.050 0.000 0.303 0.000 copy.py:145(deepcopy)
Oh.
That's a factor 10.
:trophy:
Let's see if I missed something.
Nice!
This is real, nice indeed. It might even significantly speed up the sum layers. I'll push my first implementation of the cache to the open pull request.
Pushed -- it's so easy it hurts. Some more testing (and benchmarking) could still be done. Let's talk about this next week.
and whose weights are not optimized in the same "rythm" as the other layers (if they are optimized at all).