tensorflow / kfac

An implementation of KFAC for TensorFlow
Apache License 2.0
197 stars 40 forks source link

KFAC in early stages query #44

Open priyamiitkgp opened 3 years ago

priyamiitkgp commented 3 years ago

Hi, I ran the notebook given in the docs (KERAS KFAC example for CIFAR 10) , with the same network (Resnet-20) and parameters (tuned hyperparameters) and compared the first few epochs to a standard SGD opt (lr = 0.1). The issue is that I didn't see KFAC opt being significantly faster (14x) than the SGD opt. In most loss vs epoch plots, I see KFAC is supposed to drop much faster than others (like SGD), but that wasn't the case.

Would be great if you could help me understand where I might be going wrong. I've attached a training accuracy plot comparing KFAC and SGD.

Thanks!! Screenshot (190)

james-martens commented 2 years ago

Hi. That "14x" figure applies only to a certain architecture, and isn't meant to be universal. However, I can see from the README that the phrasing suggests otherwise, and so I've removed it. So far, the most compelling applications of K-FAC that I'm aware of are to deep autoencoders and vanilla networks using DKS/TAT. See https://arxiv.org/abs/2110.01765