Closed abalijepalli closed 8 years ago
There are some serious differences in how lmfit works on Mac and Windows.
The first output of cProfile shows ~ 7 million calls to multiStateFunc, averaging 3 ms/call on Windows.
Mon Sep 14 19:27:54 2015 msa_win_ssnp.log
3499403946 function calls (3498580809 primitive calls) in 31461.548 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
6903075 23679.439 0.003 25185.921 0.004 C:\Users\xxxxxx\My Documents\Analysis\mosaic\mosaic\utilities\fit_funcs.py:40(multiStateFunc)
3999 4669.801 1.168 30689.656 7.674 {scipy.optimize._minpack._lmdif}
28343328 945.457 0.000 1505.460 0.000 C:\Users\xxxxxx\My Documents\Analysis\mosaic\mosaic\utilities\fit_funcs.py:17(heaviside)
28609288 561.241 0.000 561.241 0.000 {numpy.core.multiarray.array}
1168 334.855 0.287 31148.383 26.668 C:\Users\xxxxxx\My Documents\Analysis\mosaic\mosaic\eventSegment.py:142(_eventsegment)
For the same data set, I get 222109 calls on the mac, averaging 70 us/call (see below).
Tue Sep 15 13:45:02 2015 msa_mac_ssnp.log
1835316041 function calls (1834535764 primitive calls) in 1072.068 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
2337 456.877 0.195 667.061 0.285 eventSegment.py:146(_eventsegment)
67952 149.493 0.002 149.493 0.002 {numpy.core.multiarray.concatenate}
58600 49.241 0.001 50.345 0.001 binTrajIO.py:202(scaleData)
597461015 46.382 0.000 46.382 0.000 {abs}
61016/2441 45.286 0.001 279.400 0.114 metaTrajIO.py:224(popdata)
580190021 40.774 0.000 40.774 0.000 {method 'append' of 'collections.deque' objects}
584996573 36.411 0.000 36.411 0.000 {method 'popleft' of 'collections.deque' objects}
58602/58601 32.596 0.001 234.052 0.004 metaTrajIO.py:344(_appenddata)
9352 28.675 0.003 28.675 0.003 {method 'sort' of 'numpy.ndarray' objects}
1 20.726 20.726 1064.104 1064.104 metaEventPartition.py:150(PartitionEvents)
222109 15.871 0.000 22.101 0.000 fit_funcs.py:40(multiStateFunc)
2338 12.686 0.005 46.105 0.020 polynomial.py:396(polyfit)
That explains the difference in performance between the two platforms, but I am not sure why it would be so different.
On another probably windows-related note, I am running analysis on a 55GB data file using adapt2State, and it seems to be using almost 14GB of RAM despite analyzing 1-second chunks at a time, and has been running for about 18 hours.
We're up to 90 hours running this file at 15GB of memory. I'll leave it as long as memory usage isn't causing me problems in the hope that it finishes.
Does python natively support file IO on files larger than 2GB? I get conflicting formation on google, depending on the method used to to the IO. Could that be the issue here?
Are you using binTrajIO? If so, the files are memory mapped and shouldn't cause a problem. I have tested files larger than 2 GB, but not as big as 55 GB. I think the issue is Windows related. See performance logs above. One solution could be to use scipy curve_fit
instead of lmfit
. I have seen anecdotal evidence that performance may improve significantly. If you would like to build a Windows switch within adept2State
to try this, I can help.
Yes, I am using binTrajIO. I would be happy to build a switch in to try some different curve fitting methods.
The simplest way would be to completely replace __FitEvent()
with a new function that uses curve_fit
. See mosaic/utilities/ionic_current_stats.py
for a curve_fit
example. We can then run an OS test in _processEvent()
and call the appropriate function.
I did this, and preliminary testing suggests that it is indeed faster without obviously hurting the quality of the fit (not systematically verified yet, simply judging by eye). However, the memory usage is still very high, saturating my system memory. I will continue testing with smaller files to compare the speed difference and implement this into adept as well before submitting a pull request, but large file IO problems seems to be a separate issue.
The large file issue may also come down to OS specific support of memory maps. Is it possible for you to give me access to the large file you are looking at? I could test it on OS X.
Sure, I'll upload a compressed file to dropbox or similar tomorrow. Just running some comparisons between lmfit and scipy with adept2state on a smaller file now. If things look good I'll implement the same thing in adept as well.
results on adept2state show that scipy is actually slightly (~10%) slower than lmfit for adept2state. I will implement it in adept anyway since it is possible that the scipy implementation scales better with number of fit parameters.
It turns out that using a variable number of parameters with curve_fit() is not straightfoward at all.
Could you write a wrapper for multiStateFunc
:
def foo(t, *args):
n=int((len(args)-1)/3.)
tau, mu, a=list(args[:n]), list(args[n:2*n]), list(args[2*n:3*n])
return multiStateFunc(t, tau, mu, a, args[-1], n)
You could then pass initial guesses to curve_fit
as [tau_11,tau_12, ... ..., tau_1n, mu_11,mu_12,... ... mu_1n, a_11,a_12, ... ..., a_1n]
that will be mapped onto args
.
that works, more or less. It's up and running, stay tuned for a bit of benchmarking once I find a good file to test with.
Profile using curve_fit: definite improvement in runtime over lmfit, though still not as good as mac performance, and the bottleneck is still clearly calls to fit_func.
It might be possible top improve things further by supplying a jacobian
Tue Dec 08 18:16:24 2015 profile.log
1796818747 function calls (1796755930 primitive calls) in 5733.306 seconds
Ordered by: internal time
List reduced from 3593 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
666234 3968.613 0.006 4210.584 0.006 C:\Users\xxxxx\My Documents\Analysis\mosaic\mosaic\utilities\fit_funcs.py:45(multiStateFunc)
3999 741.726 0.185 4969.026 1.243 {scipy.optimize._minpack._lmdif}
1168 332.771 0.285 5420.424 4.641 C:\Users\xxxxx\My Documents\Analysis\mosaic\mosaic\eventSegment.py:146(_eventsegment)
3280840 161.513 0.000 241.724 0.000 C:\Users\xxxxx\My Documents\Analysis\mosaic\mosaic\utilities\fit_funcs.py:17(heaviside)
67944 90.840 0.001 90.840 0.001 {numpy.core.multiarray.concatenate}
3546758 81.454 0.000 81.454 0.000 {numpy.core.multiarray.array}
59871/1271 45.945 0.001 196.630 0.155 C:\Users\xxxxx\My Documents\Analysis\mosaic\mosaic\metaTrajIO.py:224(popdata)
58600 36.346 0.001 36.996 0.001 C:\Users\xxxxx\My Documents\Analysis\mosaic\mosaic\binTrajIO.py:202(scaleData)
3370 27.577 0.008 29.194 0.009 C:\Users\xxxxx\My Documents\Analysis\mosaic\mosaic\sqlite3MDIO.py:124(writeRecord)
592570821 25.604 0.000 25.604 0.000 {abs}```
Running a set of two files totalling 11GB with adept2state is currently taking 15GB of RAM and has slowed down quite a lot.
Decoupling speed issues between algorithms and memory usage is tricky. It's possible that some of the speed issues seen before were due to memory usage as well.
The memory usage of the program ramps up over time. It saturates fairly quickly, but not immediately, suggesting that it is not the memmap itself at fault, but rather that python is not freeing memory between requests for data chunks.
Given that this primarily occurs on Windows suggests that some low level memory management is at fault. I have processed large data (> 100 GB) sets that were broken up into smaller files, all stored in the same folder. One option is to recommend that to be a best practice to get around the problem.
To wrap up the performance issue, it may be nice to decouple the memory management problems by running a test case on several smaller files so memory usage does not saturate. That will provide a direct comparison with other OSes.
I will test that approach in the next few days and get back to you with some results.
Splitting the files up seems to help. However, I am also thinking that some of my issues may have been related to IDLE. I have some circumstantial evidence that there issues with windows performance when running a script in IDLE/pythonw.exe versus running the same script from the command line with python.exe. I will continue testing and confirm one way or another.
We may want to make the WinFitFunc
you wrote the default fit function. On a Mac, it produces identical results and is a ~ 2 times faster.
Interesting. It might be possible to further improve fitting speed by providing the Jacobian explicitly to curve_fit as well, if you decide to go that route.
That would work as long as the cost of estimating the Jacobian is small enough. For short events, it may actually slow things down. However, it should be easy to test. I'll look into running a quick test.
I only see about a 20% speedup in ADEPT 2-state by providing the Jacobian. However, the error rate increases from 1% to ~ 7%. I'm going to try specifying an explicit Jacobian to ADEPT to see if the gains are more substantial. I'll push a few commits soon. It may be worth testing the code on Windows to see if there is any effect.
MOSAIC performance on Windows is worse than Linux and Mac.