Closed sangkeun00 closed 8 months ago
I think having a concept of analog.log
is a good way to improve modularity! I agree that this will ease the code maintenance and allow advanced researchers to implement a new idea with Analog.
A few questions / comments:
forward
, backward
, grad
). Does this mean that log
in Analog refer to the above 3 and Hessian will be in a different level?
EKFAC
belong to grad
as illustrated in the provided code snippet, or Hessian?forward
, backward
are very simple and easy to understand - "what statistics to compute". On the other hand, I think, the options for grad
(Save
, EKFAC
, Log
) determines multiple functionalities around gradient. What do you think about separating options?
update_logging_config
as "configure what stats to compute", something like Per_sample
would sound more consistent with Covariance
and Mean
?@hwijeen Thanks for your reply.
As you mentioned, we don't directly use the term "Hessian" anymore. Rather, everything at the core of AnaLog is now simple statistic (e.g. mean, covariance), and we assume that Hessian can be obtained by combining these statistic (e.g. KFAC Hessian = forward_cov \kron backward_cov). This may lead to an additional mental burden for the user, so we plan to have some higher-level abstraction where users can set hessian="kfac"
and AnaLog automatically translates this to logging_config["forward"].append(Covariance); logging_config["backward"].append(Covariance)
. Regarding EKFAC, we also consider this as some sort of statistic (i.e. eigenvalues).
This is currently the biggest concern on my end. As you may have noticed, there are three types of logging in AnaLog -- 1) statistic (e.g. mean, covariance, EKFAC), 2) extraction (i.e. log), and 3) saving (i.e. save) --, and they are conceptually quite different. Indeed, our previous interface also handled them separately (considering that hessian is almost equivalent with statistic). Given this, I am not sure whether handling them all together as "grad": [Mean, Log, Save]
is a right decision. One simple way to resolve this is introducing additional hierarchy as:
# KFAC
logging_config = {
"log": ["grad"],
"save": ["grad"],
"statistic": {"forward": [Covariance], "backward": [Covariance], "grad": []}, # this used to be "hessian"
}
# EKFAC
logging_config = {
"log": ["grad"],
"save": [],
"statistic": {"forward": [], "backward": [], "grad": [EKFAC]}, # this used to be "hessian"
}
Two things, in which I am unsatisfied with the above interface are:
The first issue can be addressed by having a higher abstraction layer. I still think that this interface would make the maintenance easier and allow researchers to implement their new ideas more easily.
I am not sure how we can address the second issue. One ad-hoc solution I can think of is:
# KFAC
logging_config = {
"log": {"forward": False, "backward": False, "grad": True},
"save": {"forward": False, "backward": False, "grad": True},
"statistic": {"forward": [Covariance], "backward": [Covariance], "grad": []}, # this used to be "hessian"
}
This way, we can ensure that the configuration format is consistent across log
, save
, and statistic
(and each hook_fn
can access relevant configurations for log
, save
, and statistic
in a consistent and straightforward way, at the cost of logging_config
being a bit more lengthy.
A higher-level abstraction that can address both of these issues may look like:
# KFAC
logging_config = {
"log": ["grad"],
"save": ["grad"],
"statistic": "kfac",
}
If you have any comments, please let me know!
# KFAC
logging_config = {
"log": {"forward": False, "backward": False, "grad": True},
"save": {"forward": False, "backward": False, "grad": True},
"statistic": {"forward": [Covariance], "backward": [Covariance], "grad": []}, # this used to be "hessian"
}
I think having two hierarchy and pair of (log, stats) is fine for now, especially because AnalogScheduler will set this values internally. I believe it is the advanced users that will play with these options manually and this should be good enough?
And I think our discussion boils down to how to group the following words/concepts in such a way that would lower the mental burden of the users.
`mean`, `covariance`, `forward`, `backward`, `grad`, `hessian`, `log`, `save`, `kfac`, `ekfac`
I like your idea that we should hide some scary looking words. So kfac
would be hidden from the user interface.
My understanding of your grouping of the remaining words are:
`Forward`, `Backward`, `Grad`: objects to compute statistics of
`Mean`, `Covariance`, `EKFAC`: types of statistics to compute (`None` would mean per-sample stats)
`log`: what to extract
`save`: what to save
And my comments about it are:
EKFAC
is eigen systems (correct me if I'm mistaken). If that's the case, we should rename so that it sounds like basic stats, rather than scary 2nd order-like info.
mean
and variance
as they are not mutually exclusive. For example, we can have both Covariance
of Forward
in eigen decomposed fashion or raw. log
can be hidden from the user interface as it seems redundant now that we have statistics
. I think it is used to internally decide which hook to register and it can be determined by statistics
.My comments suggests the following grouping:
`Forward`, `Backward`, `Grad`: objects
`Mean`, `Covariance`, `None (or per-sample)`: stats
`Eigensystem`: operation performed performed on statistics
`save`: what to save
The idea is that any combination of (object, stat) should be configurable, and save
should be an option to whether or not save that pair.
@hwijeen Thanks for your quick reply.
EKFAC
is probably not the most appropriate name. The best name I can think of is something like EigvalCorrection
. What's your opinion? Overall, my idea is that statistic
is used to compute any global statistic for the dataset (e.g. mean, variance). In this sense, corrected eigenvalues are also dataset-level statistic, so I put it under the statistic
category. If you think this is unnatural, please let me know.log
: I thought a lot about this issue too. Your argument is correct if we only consider the logging phase, but it can be problematic in the analysis/evaluation phase, where the extraction is pretty much the only thing users need. Without having log
, we may need to extract everything (i.e. forward, backward, grad) and this can be quite inefficient. However, we can indeed automatically set log
based on save
and statistic
in the logging phase. If you have any solution to this analysis/evaluation phase problem, let me know!Regarding the abstraction, I propose to use three-level (low, mid, high) as follows:
# KFAC
logging_config = {
"log": {"forward": False, "backward": False, "grad": True},
"save": {"forward": False, "backward": False, "grad": True},
"statistic": {"forward": [Covariance], "backward": [Covariance], "grad": []}, # this used to be "hessian"
}
# KFAC
logging_config = {
"log": ["grad"],
"save": ["grad"],
"statistic": "kfac",
}
AnaLogScheduler
Addressed in #67
In essence, many states that AnaLog tracks for now are essentially covariance. For example, KFAC Hessian is basically (un-centered) covariance of forward activations and backward error signals, and raw Hessian is simply (un-centered) covariance of gradient. EK-FAC also updates eigenvalues based on per-sample gradient from training data. Therefore, I believe it's safe to say that the main features of logging in AnaLog are:
Problem
However, at the moment, our code is structured around the whole algorithm (e.g. KFAC) instead of basic operations (e.g. mean, covariance). For instance,
KFACHessianHandler
implements both covariance computation and EKFAC update mechanisms. In addition, RawHessianHandler implements the covariance computation mechanism separately from KFACHessianHandler. In the end, our current code lacks the modularity, and I believe this would negatively affect the maintenance of our repo in the future, especially as we add more and more features.Proposed solution
To address this issue, I propose to refactor our code base around the basic operations. For example, our code for logging KFAC/EKFAC can be re-written as below:
This way, we can improve the modularity of our code, while also allowing researchers to implement their new algorithms with AnaLog. For users who don't know how to set these
logging_config
, we will useAnaLogScheduler
to automatically set this config based on some template.@hwijeen @pomonam @hage1005 @YoungseogChung