Closed sarthakpati closed 1 year ago
Stale issue message
Another option: https://neptune.ai/product#how-it-works
This is a well-fleshed out MLOps solution, and has an offline mode.
The more I think about this, the more I realize that perhaps using Tensorboard in a nicely thought-out manner would be enough to record the information for pretty much every kind of experimentation matrix we are running.
Stale issue message
Stale issue message
I recently came across wandb
which is free and seems to be good for hyperparamter sweeps and visualization in general - https://wandb.ai/site.
Thanks!
I have seen this before and it is pretty good. Only 1 issue, though: it needs to be deployed as a web app and it isn't self sustained (for instance, like tensor board). Ideally, it would be great to have the functionality of tensor board integrated to our work flow. It provides enough flexibility for local deployment and use, while having the option to do server side deployment.
There are 2 major things we want to accomplish from this:
There are 2 major things we want to accomplish from this:
- Visualize results from a training process during hyper-parameter tuning
- Save console output to file [ref]
I feel 2 can be done well by using the default logging
module. A basic example shows that this works well in our multi-module structure pretty well. However, this would require a significant engineering effort.
Thoughts?
Agreed that the built-in logging module is best for this. I can handle that work. This will mean that we will need to start requesting changes PRs with plain print statements. Loguru and snoop are cool, but my hunch is that with the type of code we are writing, it will most likely only print out python object notations like"<numpy.array at 0xdeadbeefbadbabe>".
Can I ask what the intended user workflow is to visualize hyperparameter tuning? Do we expect this to be part of "gandlf_collectStats", (i.e., done post-hoc after a training is performed)? Or is this something we want to be generating/visualizing while training is running?
I can handle that work.
Awesome, thank you! Please let me know how I can help.
This will mean that we will need to start requesting changes PRs with plain print statements.
Once you have the logger
class set up, we would need to define which print statements go as warning, error, and so on. Can the print
statements be redirected to the logger
class config?
Can I ask what the intended user workflow is to visualize hyperparameter tuning? Do we expect this to be part of "gandlf_collectStats", (i.e., done post-hoc after a training is performed)? Or is this something we want to be generating/visualizing while training is running?
I wanted to discuss with you all how to obtain the information about the "best hyperparameters" after a set of N experiments. I guess gandlf_collectStats
would be the most extendable and maintainable way to do this. What do you think? And, we would keep this in a separate issue altogether to make the PRs easier to review.
Stale issue message
Is your feature request related to a problem? Please describe. Currently, we have our own logging class, which is fine, but it doesn't provide the option to extended debugging or error-reporting.
Describe the solution you'd like Something like loguru would be good to have. This gives more flexibility in logging, and provides more functionality related to tracing.
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context N.A.