mlcommons / modelgauge

Make it easy to automatically and uniformly measure the behavior of many AI Systems.
https://mlcommons.org/ai-safety/
Apache License 2.0
25 stars 5 forks source link

Research logging library #89

Open brianwgoldman opened 5 months ago

brianwgoldman commented 5 months ago

We should have a standardized way to log in NewHELM. There was some discussion in #78, which I'm pulling out into an issue for future consideration. Some considerations:

dhosterman commented 5 months ago

I definitely agree that something closer to the standard logger would be preferable. Do we have any insight into what drove the creation of the custom logger to begin with? Seeing that set of requirements and reevaluating it seems like a good place to start.

wpietri commented 5 months ago

I agree that we could use better logging. My first question: who's logging for, and what will they be doing? Some things that seem likely to me:

  1. developer working on core code trying to debug something
  2. External developer adding a plugin and trying to get it right
    • SUT
    • Test
  3. External user trying to figure out why a third-party plugin isn't working and write a good bug report
  4. MLC staff doing major benchmark runs:
    • making sure there are no errors
    • detecting non-critical problems, like a slow or flaky third-party API
    • solving some issue with a run

The last case definitely pulls me toward structured logging, as that way you can easily turn logs into alerts and metrics. I also think we should default to logging into one or more files with a fair bit of volume at the INFO level, and that it should be pretty easy to turn on DEBUG logging for the developer cases (1 and 2).

For the user case, 3, I think we mainly want to rely on console error messages, not logs, but it's inevitable people will sometimes need to go deeper. So the logs should at least be easy to find and attach to a bug report.

Are there other use cases people have in mind?