logix-project / logix

AI Logging for Interpretability and Explainability🔬
Apache License 2.0
74 stars 6 forks source link

Refactoring proposal for logging #65

Closed sangkeun00 closed 8 months ago

sangkeun00 commented 8 months ago

In essence, many states that AnaLog tracks for now are essentially covariance. For example, KFAC Hessian is basically (un-centered) covariance of forward activations and backward error signals, and raw Hessian is simply (un-centered) covariance of gradient. EK-FAC also updates eigenvalues based on per-sample gradient from training data. Therefore, I believe it's safe to say that the main features of logging in AnaLog are:

  1. Extract relevant logs (e.g., forward, backward, gradient) from neural networks using hooks
  2. Compute some statistic (e.g., mean, covariance) using these logs

Problem

However, at the moment, our code is structured around the whole algorithm (e.g. KFAC) instead of basic operations (e.g. mean, covariance). For instance, KFACHessianHandler implements both covariance computation and EKFAC update mechanisms. In addition, RawHessianHandler implements the covariance computation mechanism separately from KFACHessianHandler. In the end, our current code lacks the modularity, and I believe this would negatively affect the maintenance of our repo in the future, especially as we add more and more features.

Proposed solution

To address this issue, I propose to refactor our code base around the basic operations. For example, our code for logging KFAC/EKFAC can be re-written as below:

from analog.log import Mean, Covariance, Log, Save, EKFAC

# KFAC
analog.update_logging_config(
    "forward": [Covariance], "backward": [Covariance], "grad": [Save]
)

# EKFAC
analog.update_logging_config(
    "forward": [], "backward": [], "grad": [EKFAC]
)

# eval
analog.update_logging_config(
    "forward": [], "backward": [], "grad": [Log]
)

This way, we can improve the modularity of our code, while also allowing researchers to implement their new algorithms with AnaLog. For users who don't know how to set these logging_config, we will use AnaLogScheduler to automatically set this config based on some template.

@hwijeen @pomonam @hage1005 @YoungseogChung

hwijeen commented 8 months ago

I think having a concept of analog.log is a good way to improve modularity! I agree that this will ease the code maintenance and allow advanced researchers to implement a new idea with Analog. A few questions / comments:

sangkeun00 commented 8 months ago

@hwijeen Thanks for your reply.

As you mentioned, we don't directly use the term "Hessian" anymore. Rather, everything at the core of AnaLog is now simple statistic (e.g. mean, covariance), and we assume that Hessian can be obtained by combining these statistic (e.g. KFAC Hessian = forward_cov \kron backward_cov). This may lead to an additional mental burden for the user, so we plan to have some higher-level abstraction where users can set hessian="kfac" and AnaLog automatically translates this to logging_config["forward"].append(Covariance); logging_config["backward"].append(Covariance). Regarding EKFAC, we also consider this as some sort of statistic (i.e. eigenvalues).

This is currently the biggest concern on my end. As you may have noticed, there are three types of logging in AnaLog -- 1) statistic (e.g. mean, covariance, EKFAC), 2) extraction (i.e. log), and 3) saving (i.e. save) --, and they are conceptually quite different. Indeed, our previous interface also handled them separately (considering that hessian is almost equivalent with statistic). Given this, I am not sure whether handling them all together as "grad": [Mean, Log, Save] is a right decision. One simple way to resolve this is introducing additional hierarchy as:

# KFAC
logging_config = {
    "log": ["grad"],
    "save": ["grad"],
    "statistic": {"forward": [Covariance], "backward": [Covariance], "grad": []}, # this used to be "hessian"
}

# EKFAC
logging_config = {
    "log": ["grad"],
    "save": [],
    "statistic": {"forward": [], "backward": [], "grad": [EKFAC]}, # this used to be "hessian"
}

Two things, in which I am unsatisfied with the above interface are:

  1. Two-level hierarchy may be too complicated from the user perspective.
  2. "forward", "backward", and "grad" are values in "log" and "save", whereas they are keys in "statistic".

The first issue can be addressed by having a higher abstraction layer. I still think that this interface would make the maintenance easier and allow researchers to implement their new ideas more easily.

I am not sure how we can address the second issue. One ad-hoc solution I can think of is:

# KFAC
logging_config = {
    "log": {"forward": False, "backward": False, "grad": True},
    "save": {"forward": False, "backward": False, "grad": True},
    "statistic": {"forward": [Covariance], "backward": [Covariance], "grad": []}, # this used to be "hessian"
}

This way, we can ensure that the configuration format is consistent across log, save, and statistic (and each hook_fn can access relevant configurations for log, save, and statistic in a consistent and straightforward way, at the cost of logging_config being a bit more lengthy.

A higher-level abstraction that can address both of these issues may look like:

# KFAC
logging_config = {
    "log": ["grad"],
    "save": ["grad"],
    "statistic": "kfac",
}

If you have any comments, please let me know!

hwijeen commented 8 months ago
# KFAC
logging_config = {
    "log": {"forward": False, "backward": False, "grad": True},
    "save": {"forward": False, "backward": False, "grad": True},
    "statistic": {"forward": [Covariance], "backward": [Covariance], "grad": []}, # this used to be "hessian"
}

I think having two hierarchy and pair of (log, stats) is fine for now, especially because AnalogScheduler will set this values internally. I believe it is the advanced users that will play with these options manually and this should be good enough?


And I think our discussion boils down to how to group the following words/concepts in such a way that would lower the mental burden of the users.

`mean`, `covariance`, `forward`, `backward`, `grad`, `hessian`, `log`, `save`, `kfac`, `ekfac`

I like your idea that we should hide some scary looking words. So kfac would be hidden from the user interface.

My understanding of your grouping of the remaining words are:

`Forward`, `Backward`, `Grad`: objects to compute statistics of
`Mean`, `Covariance`, `EKFAC`: types of statistics to compute (`None` would mean per-sample stats)
`log`: what to extract
`save`: what to save

And my comments about it are:

My comments suggests the following grouping:

`Forward`, `Backward`, `Grad`: objects
`Mean`, `Covariance`, `None (or per-sample)`: stats
`Eigensystem`: operation performed performed on statistics
`save`: what to save

The idea is that any combination of (object, stat) should be configurable, and save should be an option to whether or not save that pair.

sangkeun00 commented 8 months ago

@hwijeen Thanks for your quick reply.

Regarding the abstraction, I propose to use three-level (low, mid, high) as follows:

  1. Low abstraction (hook-friendly)
    # KFAC
    logging_config = {
    "log": {"forward": False, "backward": False, "grad": True},
    "save": {"forward": False, "backward": False, "grad": True},
    "statistic": {"forward": [Covariance], "backward": [Covariance], "grad": []}, # this used to be "hessian"
    }
  2. Mid abstraction (for researchers familiar with ML jargons)
    # KFAC
    logging_config = {
    "log": ["grad"],
    "save": ["grad"],
    "statistic": "kfac",
    }
  3. High abstraction (for application developers): AnaLogScheduler
sangkeun00 commented 8 months ago

Addressed in #67