usnistgov / mosaic

A modular single-molecule analysis interface
https://pages.nist.gov/mosaic/
Other
37 stars 17 forks source link

Windows Performance #61

Closed abalijepalli closed 8 years ago

abalijepalli commented 9 years ago

MOSAIC performance on Windows is worse than Linux and Mac.

abalijepalli commented 9 years ago

There are some serious differences in how lmfit works on Mac and Windows.

The first output of cProfile shows ~ 7 million calls to multiStateFunc, averaging 3 ms/call on Windows.

Mon Sep 14 19:27:54 2015    msa_win_ssnp.log

         3499403946 function calls (3498580809 primitive calls) in 31461.548 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  6903075 23679.439    0.003 25185.921    0.004 C:\Users\xxxxxx\My Documents\Analysis\mosaic\mosaic\utilities\fit_funcs.py:40(multiStateFunc)
     3999 4669.801    1.168 30689.656    7.674 {scipy.optimize._minpack._lmdif}
 28343328  945.457    0.000 1505.460    0.000 C:\Users\xxxxxx\My Documents\Analysis\mosaic\mosaic\utilities\fit_funcs.py:17(heaviside)
 28609288  561.241    0.000  561.241    0.000 {numpy.core.multiarray.array}
     1168  334.855    0.287 31148.383   26.668 C:\Users\xxxxxx\My Documents\Analysis\mosaic\mosaic\eventSegment.py:142(_eventsegment)

For the same data set, I get 222109 calls on the mac, averaging 70 us/call (see below).

Tue Sep 15 13:45:02 2015    msa_mac_ssnp.log

         1835316041 function calls (1834535764 primitive calls) in 1072.068 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2337  456.877    0.195  667.061    0.285 eventSegment.py:146(_eventsegment)
    67952  149.493    0.002  149.493    0.002 {numpy.core.multiarray.concatenate}
    58600   49.241    0.001   50.345    0.001 binTrajIO.py:202(scaleData)
597461015   46.382    0.000   46.382    0.000 {abs}
61016/2441   45.286    0.001  279.400    0.114 metaTrajIO.py:224(popdata)
580190021   40.774    0.000   40.774    0.000 {method 'append' of 'collections.deque' objects}
584996573   36.411    0.000   36.411    0.000 {method 'popleft' of 'collections.deque' objects}
58602/58601   32.596    0.001  234.052    0.004 metaTrajIO.py:344(_appenddata)
     9352   28.675    0.003   28.675    0.003 {method 'sort' of 'numpy.ndarray' objects}
        1   20.726   20.726 1064.104 1064.104 metaEventPartition.py:150(PartitionEvents)
   222109   15.871    0.000   22.101    0.000 fit_funcs.py:40(multiStateFunc)
     2338   12.686    0.005   46.105    0.020 polynomial.py:396(polyfit)

That explains the difference in performance between the two platforms, but I am not sure why it would be so different.

shadowk29 commented 8 years ago

On another probably windows-related note, I am running analysis on a 55GB data file using adapt2State, and it seems to be using almost 14GB of RAM despite analyzing 1-second chunks at a time, and has been running for about 18 hours.

shadowk29 commented 8 years ago

We're up to 90 hours running this file at 15GB of memory. I'll leave it as long as memory usage isn't causing me problems in the hope that it finishes.

Does python natively support file IO on files larger than 2GB? I get conflicting formation on google, depending on the method used to to the IO. Could that be the issue here?

abalijepalli commented 8 years ago

Are you using binTrajIO? If so, the files are memory mapped and shouldn't cause a problem. I have tested files larger than 2 GB, but not as big as 55 GB. I think the issue is Windows related. See performance logs above. One solution could be to use scipy curve_fit instead of lmfit. I have seen anecdotal evidence that performance may improve significantly. If you would like to build a Windows switch within adept2State to try this, I can help.

shadowk29 commented 8 years ago

Yes, I am using binTrajIO. I would be happy to build a switch in to try some different curve fitting methods.

abalijepalli commented 8 years ago

The simplest way would be to completely replace __FitEvent() with a new function that uses curve_fit. See mosaic/utilities/ionic_current_stats.py for a curve_fit example. We can then run an OS test in _processEvent() and call the appropriate function.

shadowk29 commented 8 years ago

I did this, and preliminary testing suggests that it is indeed faster without obviously hurting the quality of the fit (not systematically verified yet, simply judging by eye). However, the memory usage is still very high, saturating my system memory. I will continue testing with smaller files to compare the speed difference and implement this into adept as well before submitting a pull request, but large file IO problems seems to be a separate issue.

abalijepalli commented 8 years ago

The large file issue may also come down to OS specific support of memory maps. Is it possible for you to give me access to the large file you are looking at? I could test it on OS X.

shadowk29 commented 8 years ago

Sure, I'll upload a compressed file to dropbox or similar tomorrow. Just running some comparisons between lmfit and scipy with adept2state on a smaller file now. If things look good I'll implement the same thing in adept as well.

shadowk29 commented 8 years ago

results on adept2state show that scipy is actually slightly (~10%) slower than lmfit for adept2state. I will implement it in adept anyway since it is possible that the scipy implementation scales better with number of fit parameters.

shadowk29 commented 8 years ago

It turns out that using a variable number of parameters with curve_fit() is not straightfoward at all.

abalijepalli commented 8 years ago

Could you write a wrapper for multiStateFunc:

def foo(t, *args):
    n=int((len(args)-1)/3.)
    tau, mu, a=list(args[:n]), list(args[n:2*n]), list(args[2*n:3*n])
    return multiStateFunc(t, tau, mu, a, args[-1], n)

You could then pass initial guesses to curve_fit as [tau_11,tau_12, ... ..., tau_1n, mu_11,mu_12,... ... mu_1n, a_11,a_12, ... ..., a_1n] that will be mapped onto args.

shadowk29 commented 8 years ago

that works, more or less. It's up and running, stay tuned for a bit of benchmarking once I find a good file to test with.

shadowk29 commented 8 years ago

Profile using curve_fit: definite improvement in runtime over lmfit, though still not as good as mac performance, and the bottleneck is still clearly calls to fit_func.

It might be possible top improve things further by supplying a jacobian

     Tue Dec 08 18:16:24 2015    profile.log

     1796818747 function calls (1796755930 primitive calls) in 5733.306 seconds

        Ordered by: internal time
        List reduced from 3593 to 10 due to restriction <10>

        ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        666234 3968.613    0.006 4210.584    0.006 C:\Users\xxxxx\My Documents\Analysis\mosaic\mosaic\utilities\fit_funcs.py:45(multiStateFunc)
          3999  741.726    0.185 4969.026    1.243 {scipy.optimize._minpack._lmdif}
          1168  332.771    0.285 5420.424    4.641 C:\Users\xxxxx\My Documents\Analysis\mosaic\mosaic\eventSegment.py:146(_eventsegment)
       3280840  161.513    0.000  241.724    0.000 C:\Users\xxxxx\My Documents\Analysis\mosaic\mosaic\utilities\fit_funcs.py:17(heaviside)
         67944   90.840    0.001   90.840    0.001 {numpy.core.multiarray.concatenate}
       3546758   81.454    0.000   81.454    0.000 {numpy.core.multiarray.array}
     59871/1271   45.945    0.001  196.630    0.155 C:\Users\xxxxx\My Documents\Analysis\mosaic\mosaic\metaTrajIO.py:224(popdata)
         58600   36.346    0.001   36.996    0.001 C:\Users\xxxxx\My Documents\Analysis\mosaic\mosaic\binTrajIO.py:202(scaleData)
          3370   27.577    0.008   29.194    0.009 C:\Users\xxxxx\My Documents\Analysis\mosaic\mosaic\sqlite3MDIO.py:124(writeRecord)
     592570821   25.604    0.000   25.604    0.000 {abs}```
shadowk29 commented 8 years ago

Running a set of two files totalling 11GB with adept2state is currently taking 15GB of RAM and has slowed down quite a lot.

Decoupling speed issues between algorithms and memory usage is tricky. It's possible that some of the speed issues seen before were due to memory usage as well.

The memory usage of the program ramps up over time. It saturates fairly quickly, but not immediately, suggesting that it is not the memmap itself at fault, but rather that python is not freeing memory between requests for data chunks.

abalijepalli commented 8 years ago

Given that this primarily occurs on Windows suggests that some low level memory management is at fault. I have processed large data (> 100 GB) sets that were broken up into smaller files, all stored in the same folder. One option is to recommend that to be a best practice to get around the problem.

To wrap up the performance issue, it may be nice to decouple the memory management problems by running a test case on several smaller files so memory usage does not saturate. That will provide a direct comparison with other OSes.

shadowk29 commented 8 years ago

I will test that approach in the next few days and get back to you with some results.

shadowk29 commented 8 years ago

Splitting the files up seems to help. However, I am also thinking that some of my issues may have been related to IDLE. I have some circumstantial evidence that there issues with windows performance when running a script in IDLE/pythonw.exe versus running the same script from the command line with python.exe. I will continue testing and confirm one way or another.

abalijepalli commented 8 years ago

We may want to make the WinFitFunc you wrote the default fit function. On a Mac, it produces identical results and is a ~ 2 times faster.

shadowk29 commented 8 years ago

Interesting. It might be possible to further improve fitting speed by providing the Jacobian explicitly to curve_fit as well, if you decide to go that route.

abalijepalli commented 8 years ago

That would work as long as the cost of estimating the Jacobian is small enough. For short events, it may actually slow things down. However, it should be easy to test. I'll look into running a quick test.

abalijepalli commented 8 years ago

I only see about a 20% speedup in ADEPT 2-state by providing the Jacobian. However, the error rate increases from 1% to ~ 7%. I'm going to try specifying an explicit Jacobian to ADEPT to see if the gains are more substantial. I'll push a few commits soon. It may be worth testing the code on Windows to see if there is any effect.