pysal / spreg

Spatial econometric regression in Python
https://pysal.org/spreg/
Other
67 stars 23 forks source link

demean_panel() tries (and fails) to allocate ENORMOUS amounts of memory #71

Closed rhstanton closed 3 years ago

rhstanton commented 3 years ago

First, Thanks for this code! I'm trying to use spreg to run some regressions on a panel data set with 12,710 regions and 2280 time periods. Unfortunately, I can't get spreg.Panel_FE_Lag or spreg.Panel_RE_Lag to run with these data. When I do, I get the complaint:

MemoryError: Unable to allocate 5.97 PiB for an array with shape (5198400, 161544100) and data type float64

This is a lot of memory, so it's not surprising the machine can't allocate it. Note that the array dimensions quoted are ( 2280^2 , 12710^2), nt^2 x nx^2. The problem seems to arise when using the function demean-panel. Detailed error trace follows:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-81-94f3ad129df6> in <module>
      4                "veg11pc", "NE", "SE", "SEP", "OCT"]]
      5 # re_lag = spreg.Panel_RE_Lag(y, x, wfull, name_y=name_y, name_x=name_x, name_ds="NAT")
----> 6 re_lag = spreg.Panel_FE_Lag(y.values, x.values, w)

~/anaconda3/lib/python3.8/site-packages/spreg/panel_fe.py in __init__(self, y, x, w, epsilon, vm, name_y, name_x, name_w, name_ds)
    308         USER.check_weights(w, bigy, w_required=True, time=True)
    309 
--> 310         BasePanel_FE_Lag.__init__(
    311             self, bigy, bigx, w, epsilon=epsilon)
    312         # increase by 1 to have correct aic and sc, include rho in count

~/anaconda3/lib/python3.8/site-packages/spreg/panel_fe.py in __init__(self, y, x, w, epsilon)
     94         self.epsilon = epsilon
     95         # Demeaned variables
---> 96         self.y = demean_panel(y, self.n, self.t)
     97         self.x = demean_panel(x, self.n, self.t)
     98         # Big W matrix

~/anaconda3/lib/python3.8/site-packages/spreg/panel_utils.py in demean_panel(arr, n, t, phi)
    105     one = np.ones((t, 1))
    106     J = np.identity(t) - (1-phi)*(1/t)*spdot(one, one.T)
--> 107     Q = np.kron(J, np.identity(n))
    108     arr_dm = spdot(Q, arr)
    109 

<__array_function__ internals> in kron(*args, **kwargs)

~/anaconda3/lib/python3.8/site-packages/numpy/lib/shape_base.py in kron(a, b)
   1152             bs = (1,)*(nda-ndb) + bs
   1153             nd = nda
-> 1154     result = outer(a, b).reshape(as_+bs)
   1155     axis = nd-1
   1156     for _ in range(nd):

<__array_function__ internals> in outer(*args, **kwargs)

~/anaconda3/lib/python3.8/site-packages/numpy/core/numeric.py in outer(a, b, out)
    940     a = asarray(a)
    941     b = asarray(b)
--> 942     return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis, :], out)
    943 
    944 

MemoryError: Unable to allocate 5.97 PiB for an array with shape (5198400, 161544100) and data type float64
pedrovma commented 3 years ago

PR #83 should resolve this issue.