Refactoring proposal for logging

In essence, many states that AnaLog tracks for now are essentially covariance. For example, KFAC Hessian is basically (un-centered) covariance of forward activations and backward error signals, and raw Hessian is simply (un-centered) covariance of gradient. EK-FAC also updates eigenvalues based on per-sample gradient from training data. Therefore, I believe it's safe to say that the main features of logging in AnaLog are:

Extract relevant logs (e.g., forward, backward, gradient) from neural networks using hooks
Compute some statistic (e.g., mean, covariance) using these logs

Problem

However, at the moment, our code is structured around the whole algorithm (e.g. KFAC) instead of basic operations (e.g. mean, covariance). For instance, KFACHessianHandler implements both covariance computation and EKFAC update mechanisms. In addition, RawHessianHandler implements the covariance computation mechanism separately from KFACHessianHandler. In the end, our current code lacks the modularity, and I believe this would negatively affect the maintenance of our repo in the future, especially as we add more and more features.

Proposed solution

To address this issue, I propose to refactor our code base around the basic operations. For example, our code for logging KFAC/EKFAC can be re-written as below:

from analog.log import Mean, Covariance, Log, Save, EKFAC

# KFAC
analog.update_logging_config(
    "forward": [Covariance], "backward": [Covariance], "grad": [Save]
)

# EKFAC
analog.update_logging_config(
    "forward": [], "backward": [], "grad": [EKFAC]
)

# eval
analog.update_logging_config(
    "forward": [], "backward": [], "grad": [Log]
)

This way, we can improve the modularity of our code, while also allowing researchers to implement their new algorithms with AnaLog. For users who don't know how to set these logging_config, we will use AnaLogScheduler to automatically set this config based on some template.

@hwijeen @pomonam @hage1005 @YoungseogChung

I think having a concept of analog.log is a good way to improve modularity! I agree that this will ease the code maintenance and allow advanced researchers to implement a new idea with Analog. A few questions / comments:

It seems that how to compute Hessian is rather implicit (e.g. computing Hessian via KFAC is configured with forward, backward, grad). Does this mean that log in Analog refer to the above 3 and Hessian will be in a different level?
- Does EKFAC belong to grad as illustrated in the provided code snippet, or Hessian?
Values passed to forward, backward are very simple and easy to understand - "what statistics to compute". On the other hand, I think, the options for grad (Save, EKFAC, Log) determines multiple functionalities around gradient. What do you think about separating options?
- If we think update_logging_config as "configure what stats to compute", something like Per_sample would sound more consistent with Covariance and Mean?
- And then saving could be controlled by another argument (save covariance?, save grad?).

@hwijeen Thanks for your reply.

It seems that how to compute Hessian is rather implicit (e.g. computing Hessian via KFAC is configured with forward, backward, grad). Does this mean that log in Analog refer to the above 3 and Hessian will be in a different level?

As you mentioned, we don't directly use the term "Hessian" anymore. Rather, everything at the core of AnaLog is now simple statistic (e.g. mean, covariance), and we assume that Hessian can be obtained by combining these statistic (e.g. KFAC Hessian = forward_cov \kron backward_cov). This may lead to an additional mental burden for the user, so we plan to have some higher-level abstraction where users can set hessian="kfac" and AnaLog automatically translates this to logging_config["forward"].append(Covariance); logging_config["backward"].append(Covariance). Regarding EKFAC, we also consider this as some sort of statistic (i.e. eigenvalues).

Values passed to forward, backward are very simple and easy to understand - "what statistics to compute". On the other hand, I think, the options for grad (Save, EKFAC, Log) determines multiple functionalities around gradient. What do you think about separating options?

This is currently the biggest concern on my end. As you may have noticed, there are three types of logging in AnaLog -- 1) statistic (e.g. mean, covariance, EKFAC), 2) extraction (i.e. log), and 3) saving (i.e. save) --, and they are conceptually quite different. Indeed, our previous interface also handled them separately (considering that hessian is almost equivalent with statistic). Given this, I am not sure whether handling them all together as "grad": [Mean, Log, Save] is a right decision. One simple way to resolve this is introducing additional hierarchy as:

# KFAC
logging_config = {
    "log": ["grad"],
    "save": ["grad"],
    "statistic": {"forward": [Covariance], "backward": [Covariance], "grad": []}, # this used to be "hessian"
}

# EKFAC
logging_config = {
    "log": ["grad"],
    "save": [],
    "statistic": {"forward": [], "backward": [], "grad": [EKFAC]}, # this used to be "hessian"
}

Two things, in which I am unsatisfied with the above interface are:

Two-level hierarchy may be too complicated from the user perspective.
"forward", "backward", and "grad" are values in "log" and "save", whereas they are keys in "statistic".

The first issue can be addressed by having a higher abstraction layer. I still think that this interface would make the maintenance easier and allow researchers to implement their new ideas more easily.

I am not sure how we can address the second issue. One ad-hoc solution I can think of is:

# KFAC
logging_config = {
    "log": {"forward": False, "backward": False, "grad": True},
    "save": {"forward": False, "backward": False, "grad": True},
    "statistic": {"forward": [Covariance], "backward": [Covariance], "grad": []}, # this used to be "hessian"
}

This way, we can ensure that the configuration format is consistent across log, save, and statistic (and each hook_fn can access relevant configurations for log, save, and statistic in a consistent and straightforward way, at the cost of logging_config being a bit more lengthy.

A higher-level abstraction that can address both of these issues may look like:

# KFAC
logging_config = {
    "log": ["grad"],
    "save": ["grad"],
    "statistic": "kfac",
}

If you have any comments, please let me know!

# KFAC
logging_config = {
    "log": {"forward": False, "backward": False, "grad": True},
    "save": {"forward": False, "backward": False, "grad": True},
    "statistic": {"forward": [Covariance], "backward": [Covariance], "grad": []}, # this used to be "hessian"
}

I think having two hierarchy and pair of (log, stats) is fine for now, especially because AnalogScheduler will set this values internally. I believe it is the advanced users that will play with these options manually and this should be good enough?

And I think our discussion boils down to how to group the following words/concepts in such a way that would lower the mental burden of the users.

`mean`, `covariance`, `forward`, `backward`, `grad`, `hessian`, `log`, `save`, `kfac`, `ekfac`

I like your idea that we should hide some scary looking words. So kfac would be hidden from the user interface.

My understanding of your grouping of the remaining words are:

`Forward`, `Backward`, `Grad`: objects to compute statistics of
`Mean`, `Covariance`, `EKFAC`: types of statistics to compute (`None` would mean per-sample stats)
`log`: what to extract
`save`: what to save

And my comments about it are:

I think what you mean by EKFAC is eigen systems (correct me if I'm mistaken). If that's the case, we should rename so that it sounds like basic stats, rather than scary 2nd order-like info.
- I think eigen system belongs to different category than mean and variance as they are not mutually exclusive. For example, we can have both Covariance of Forward in eigen decomposed fashion or raw.
I think log can be hidden from the user interface as it seems redundant now that we have statistics. I think it is used to internally decide which hook to register and it can be determined by statistics.

My comments suggests the following grouping:

`Forward`, `Backward`, `Grad`: objects
`Mean`, `Covariance`, `None (or per-sample)`: stats
`Eigensystem`: operation performed performed on statistics
`save`: what to save

The idea is that any combination of (object, stat) should be configurable, and save should be an option to whether or not save that pair.

@hwijeen Thanks for your quick reply.

EKFAC naming: I agree that EKFAC is probably not the most appropriate name. The best name I can think of is something like EigvalCorrection. What's your opinion? Overall, my idea is that statistic is used to compute any global statistic for the dataset (e.g. mean, variance). In this sense, corrected eigenvalues are also dataset-level statistic, so I put it under the statistic category. If you think this is unnatural, please let me know.
Redundancy of log: I thought a lot about this issue too. Your argument is correct if we only consider the logging phase, but it can be problematic in the analysis/evaluation phase, where the extraction is pretty much the only thing users need. Without having log, we may need to extract everything (i.e. forward, backward, grad) and this can be quite inefficient. However, we can indeed automatically set log based on save and statistic in the logging phase. If you have any solution to this analysis/evaluation phase problem, let me know!

Regarding the abstraction, I propose to use three-level (low, mid, high) as follows:

Low abstraction (hook-friendly)

# KFAC
logging_config = {
"log": {"forward": False, "backward": False, "grad": True},
"save": {"forward": False, "backward": False, "grad": True},
"statistic": {"forward": [Covariance], "backward": [Covariance], "grad": []}, # this used to be "hessian"
}

Mid abstraction (for researchers familiar with ML jargons)

# KFAC
logging_config = {
"log": ["grad"],
"save": ["grad"],
"statistic": "kfac",
}

High abstraction (for application developers): AnaLogScheduler

logix-project / logix

Refactoring proposal for logging #65

Problem

Proposed solution