nmi-lab / decolle-public

GNU General Public License v3.0
41 stars 22 forks source link

LIFLayer dynamics #15

Closed weinman closed 2 years ago

weinman commented 2 years ago

Thanks to everyone involved for sharing the detailed code and well-written paper about the method.

I apologize if this is elementary, but I'm trying to reconcile the implementation of the dynamics in class LIFLayer with the equations written in the paper.

Specifically, the forward method (here) updates Q and P using members tau_s and tau_m, respectively.

These members are set in the constructor; for example (abridged for clarity)

tau_m = 1./(1-alpha)

Then P is updated as

P = self.alpha * state.P + self.tau_m * state.Q  

Since the paper defines (Equation 4)

P_j^l \left[t+\Delta t\right] = \alpha P_j^l\left[t\right] + (1-\alpha) Q_j^l\left[t\right]

and

\alpha = \exp\left(-\frac{\Delta t}{\tau_{\mathrm{mem}}}\right),

I'm wondering why the line above isn't

P = self.alpha * state.P + (1-self.alpha) * state.Q  

(and perhaps also tau_m = - dt / log(alpha), though I'm not sure this would matter as much since tau_m seems to have no other use in this class, yet it does in class LIFLayerVariableTau).

Is it simply because

\frac{1}{1-\alpha} \approx -\frac{1}{\log \alpha}

for the range of α values in use?

In sum, what is the reason for using tau_m and tau_s for updating P and Q respectively, rather than (1-alpha) and (1-beta)?

eneftci commented 2 years ago

Thanks for noting this issue. You are correct that there is a mismatch between how the time constants are described in the paper and this repository, due to different versions of the paper. Since we are not training the time constants, this does not affect the overall results. However it does change the interpretation of the parameters.

weinman commented 2 years ago

Thanks. I see that 61aca54 fixes the dynamics to match the paper here. Thanks for that! The factorization is much cleaner/clearer (and maybe it's just me, but it also seems to run a bit faster).

I do still wonder a bit about the initialization/definition of tau_m (and tau_s) here.

self.tau_m = torch.nn.Parameter(1. / (1 - self.alpha), requires_grad=False)

Is this expression simply a convenient, log-avoiding approximation for

\alpha = \exp\left(-\frac{\Delta t}{\tau_{\mathrm{mem}}}\right) \Longrightarrow \tau_{\mathrm{mem}} =-\frac{\Delta t}{\log \alpha},

or is there something deeper I'm missing?

My thanks again for sharing this work.

eneftci commented 2 years ago

The first order approximation is a relic of a previous implementation. The definition with the log you propose is perfectly valid and preferable.