mlcommons / GaNDLF

A generalizable application framework for segmentation, regression, and classification using PyTorch
https://gandlf.org
Apache License 2.0
154 stars 78 forks source link

Improved logging #265

Closed sarthakpati closed 1 year ago

sarthakpati commented 2 years ago

Is your feature request related to a problem? Please describe. Currently, we have our own logging class, which is fine, but it doesn't provide the option to extended debugging or error-reporting.

Describe the solution you'd like Something like loguru would be good to have. This gives more flexibility in logging, and provides more functionality related to tracing.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context N.A.

github-actions[bot] commented 2 years ago

Stale issue message

sarthakpati commented 2 years ago

Another option: https://neptune.ai/product#how-it-works

This is a well-fleshed out MLOps solution, and has an offline mode.

sarthakpati commented 2 years ago

The more I think about this, the more I realize that perhaps using Tensorboard in a nicely thought-out manner would be enough to record the information for pretty much every kind of experimentation matrix we are running.

github-actions[bot] commented 2 years ago

Stale issue message

github-actions[bot] commented 2 years ago

Stale issue message

meghbhalerao commented 2 years ago

I recently came across wandb which is free and seems to be good for hyperparamter sweeps and visualization in general - https://wandb.ai/site.

sarthakpati commented 2 years ago

Thanks!

I have seen this before and it is pretty good. Only 1 issue, though: it needs to be deployed as a web app and it isn't self sustained (for instance, like tensor board). Ideally, it would be great to have the functionality of tensor board integrated to our work flow. It provides enough flexibility for local deployment and use, while having the option to do server side deployment.

sarthakpati commented 2 years ago

There are 2 major things we want to accomplish from this:

  1. Visualize results from a training process during hyper-parameter tuning
  2. Save console output to file [ref]
sarthakpati commented 2 years ago

There are 2 major things we want to accomplish from this:

  1. Visualize results from a training process during hyper-parameter tuning
  2. Save console output to file [ref]

I feel 2 can be done well by using the default logging module. A basic example shows that this works well in our multi-module structure pretty well. However, this would require a significant engineering effort.

Thoughts?

AlexanderGetka-cbica commented 2 years ago

Agreed that the built-in logging module is best for this. I can handle that work. This will mean that we will need to start requesting changes PRs with plain print statements. Loguru and snoop are cool, but my hunch is that with the type of code we are writing, it will most likely only print out python object notations like"<numpy.array at 0xdeadbeefbadbabe>".

Can I ask what the intended user workflow is to visualize hyperparameter tuning? Do we expect this to be part of "gandlf_collectStats", (i.e., done post-hoc after a training is performed)? Or is this something we want to be generating/visualizing while training is running?

sarthakpati commented 2 years ago

I can handle that work.

Awesome, thank you! Please let me know how I can help.

This will mean that we will need to start requesting changes PRs with plain print statements.

Once you have the logger class set up, we would need to define which print statements go as warning, error, and so on. Can the print statements be redirected to the logger class config?

Can I ask what the intended user workflow is to visualize hyperparameter tuning? Do we expect this to be part of "gandlf_collectStats", (i.e., done post-hoc after a training is performed)? Or is this something we want to be generating/visualizing while training is running?

I wanted to discuss with you all how to obtain the information about the "best hyperparameters" after a set of N experiments. I guess gandlf_collectStats would be the most extendable and maintainable way to do this. What do you think? And, we would keep this in a separate issue altogether to make the PRs easier to review.

github-actions[bot] commented 1 year ago

Stale issue message