wandb is an instrument that integrates in torch to make a good training process visualization:
There are visualizations generated by wandb. It helps to:
compare the difference between runs (the most essential for me is eval/loss) so you can easily experiment with training parameters.
track your current runs when you are "out from keyboard" (e.g. for watching your torch run on smartphone)
visualize that your current run is bad (e.g. you tried to reformulate the instruction and it showed worse result) and have a visualized reason to kill this run.
share your training results with community. It can be good for sharing lora runs. Reports are private by default.
hardware troubleshooting: wandb visualizes data about your hardware status (power usage, allocated memory, utilization, etc):
This PR adds wandb integration for alpaca_lora_4bit
wandb is an instrument that integrates in torch to make a good training process visualization:
There are visualizations generated by wandb. It helps to:
Also I found that your code does not use eval dataset at all despite it prepares it: https://github.com/johnsmith0031/alpaca_lora_4bit/blob/main/train_data.py#L166 So every time 20% of dataset was just unused.