pysal / gwr

This is a repository for the geographically-weighted regression submodule of the Python Spatial Analysis Library
BSD 2-Clause "Simplified" License
16 stars 16 forks source link

Excessive computation in golden_section search can be avoided #15

Closed Ziqi-Li closed 6 years ago

Ziqi-Li commented 6 years ago

In golden_section() search, it is like either b will become d(d <- b when score_b <= score_d) or d will become b (b <- d when score_b > score_d) for the next iteration, thus either gwr_func(b) or gwr_func(d) has been computed in previous iteration and do not need to be re-computed.

We could add a dictionary with key-value as {bw : gwr_func(bw)} to store previously computed gwr_func(bw) and in every iteration we check if for a given bw, gwr_func(bw) has been computed or not. If not, just compute it. If bw exists in dictionary, just get the value. This should approximately save half of the time when searching for optimal bw.

def golden_section(a, c, delta, function, tol, max_iter, int_score=True):
    ...
    dict = {} # {bw:aicc}
    for iters in range(max_iter):
        ...
        #score_b = function(b)
        #score_d = function(d)
        if b in dict:
            score_b = dict[b]
        else:
            score_b = function(b)
            dict[b] = score_b
        if d in dict:
            score_d = dict[d]
        else:
            score_d = function(d)
            dict[d] = score_d
        ...
ljwolf commented 6 years ago

Could you provide a benchmark against the existing implementation & against the scipy stuff? I think we should probably (in future) default to the scipy implementation where we can.

ljwolf commented 6 years ago

If you use http://mortada.net/easily-profile-python-code-in-jupyter.html it should be real simple to see the savings. Hope that's alright.

Ziqi-Li commented 6 years ago

Thanks @ljwolf for pointing out the profiling tool

Before:

Timer unit: 1e-06 s

Total time: 19.7563 s
File: /Users/Ziqi/Desktop/developer/gwr/gwr/sel_bw.py
Function: _bw at line 281

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   281                                               def _bw(self):
   282         1            5      5.0      0.0          gwr_func = lambda bw: self._fast_fit(bw,constant=self.constant)[self.criterion]
   283         1            1      1.0      0.0          if self.search == 'golden_section':
   284         1          989    989.0      0.0              a,c = self._init_section(self.X_glob, self.X_loc, self.coords,self.constant)
   285         1            1      1.0      0.0              delta = 0.38197 #1 - (np.sqrt(5.0)-1.0)/2.0
   286         1     19755331 19755331.0    100.0              self.bw = golden_section(a, c, delta, gwr_func, self.tol,self.max_iter, self.int_score)
   287                                                   elif self.search == 'interval':
   288                                                       self.bw = equal_interval(self.bw_min, self.bw_max, self.interval,gwr_func, self.int_score)
   289                                                   else:
   290                                                       raise TypeError('Unsupported computational search method ', search)

After:

Timer unit: 1e-06 s

Total time: 9.73623 s
File: /Users/Ziqi/Desktop/developer/gwr/gwr/sel_bw.py
Function: _bw at line 281

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   281                                               def _bw(self):
   282         1           13     13.0      0.0          gwr_func = lambda bw: self._fast_fit(bw,constant=self.constant)[self.criterion]
   283         1            1      1.0      0.0          if self.search == 'golden_section':
   284         1          938    938.0      0.0              a,c = self._init_section(self.X_glob, self.X_loc, self.coords,self.constant)
   285         1            1      1.0      0.0              delta = 0.38197 #1 - (np.sqrt(5.0)-1.0)/2.0
   286         1      9735279 9735279.0    100.0              self.bw = golden_section(a, c, delta, gwr_func, self.tol,self.max_iter, self.int_score)
   287                                                   elif self.search == 'interval':
   288                                                       self.bw = equal_interval(self.bw_min, self.bw_max, self.interval,gwr_func, self.int_score)
   289                                                   else:
   290                                                       raise TypeError('Unsupported computational search method ', search)

19755331.0/9735279.0 ~ 2.03x speed up from golden_section()