hikettei / cl-waffe2

[Experimental] Graph and Tensor Abstraction for Deep Learning all in Common Lisp
https://hikettei.github.io/cl-waffe2/
MIT License
122 stars 5 forks source link

examples/mnist/mlp.lisp - reset-compiled-function-cache! question #148

Closed atzmueller closed 7 months ago

atzmueller commented 8 months ago

Using the (current version) of mlp.lisp, for the first call of, e.g., (train-and-valid-mlp :epoch-num 11 :benchmark-p nil) the training loss in the first epoch is around 0.26 usually.

For further runs (when evaluating (train-and-valid-mlp :epoch-num 11 :benchmark-p nil)), the loss is larger (around 0.76 in the first epoch). I suspect that this is caused by some caching in the compiler and different initializations of the compiled structures, since if I evaluate (cl-waffe2/vm.generic-tensor::reset-compiled-function-cache!) before evaluating (train-and-valid-mlp :epoch-num 11 :benchmark-p nil), then the loss is in the same range as for the very first run.

Is this the intended behavior, or should the reset be applied somewhere when the model is built/compiled?

hikettei commented 8 months ago

I didn't expect such a behavior because caches are created for each AbstractNode whose complied codes does not include any tensors. (As a design), users do not have to consider the existence of the function as it is not exported.

I suspect that this is caused by some caching in the compiler and different initializations ...

I thought about that one too; and I guess the current implementation of adam has something to do with since if i change these lines:

;; Case1. Adam -> SGD
;; From
(mapc (hooker x (Adam x :lr lr)) (model-parameters model))
;; To
(mapc (hooker x (SGD x :lr lr)) (model-parameters model))
;; Case2. Deleting Adam
;; deleting this line (and fails to optimize the model)
(mapc #'call-optimizer! (model-parameters model))

In the both two cases, the loss is in the same range for each epoch, and each run. I'm still analyzing the issue. Thank you for the bug report.

hikettei commented 7 months ago

The issue should be fixed at the latest PR #149; so closing. I have been had a lot on my plates this month and i could not tackle this issue quickly; sorry for my delayed answer.

atzmueller commented 7 months ago

Yes, it works, thanks!