Closed Chevolier closed 1 year ago
@Chevolier, can you please clarify the program you are referring to? It would be helpful to share what you are running and the expected output. Thanks!
@Chevolier, can you please clarify the program you are referring to? It would be helpful to share what you are running and the expected output. Thanks!
I mean the the 3rd step, applications/DeepSpeed-Chat/training)/step3_rlhf_finetuning/. In particular, I run the multi-node/run_66b.sh using 2 nodes with 16 GPUs in total, and the model is bloomz-7b1, I can see the reward score in the standard output. But can I see the training process using tensorboard?
Hi @Chevolier,
DeepSpeed has monitoring functionality built in and the monitor can be selected by specifying the corresponding configuration (TensorBoard, WandB, csv).
The documentation can be found here: https://www.deepspeed.ai/docs/config-json/#monitoring-module-tensorboard-wandb-csv
For TensorBoard, an example configuration may look like this:
"tensorboard": { "enabled": True, "output_path": "output/ds_logs/", "job_name": "train_bert" }
The configuration can be added to the get_train_ds_config
utility function found here:
https://github.com/microsoft/DeepSpeedExamples/blob/dafeb2b3be3a085214faa2f59a8979c051424938/applications/DeepSpeed-Chat/training/utils/ds_utils.py#L32
Which will allow models that are initialized to have a monitor specified. Please let me know if you run into any problems with this method.
Thanks, Lev
Hi @Chevolier,
Just wanted to update you that we have a PR to add various instrumentation across all the DS Chat steps, including tensorboard logging (GH-624)
Feel free to give it a try to see if it works on your end.
Thanks, Lev
Does this program supports tensorboard? Could not find any logs of tensorbard.