microsoft / DeepSpeedExamples

Example models using DeepSpeed
Apache License 2.0
6.12k stars 1.05k forks source link

Does this program supports tensorboard? #388

Closed Chevolier closed 1 year ago

Chevolier commented 1 year ago

Does this program supports tensorboard? Could not find any logs of tensorbard.

tjruwase commented 1 year ago

@Chevolier, can you please clarify the program you are referring to? It would be helpful to share what you are running and the expected output. Thanks!

Chevolier commented 1 year ago

@Chevolier, can you please clarify the program you are referring to? It would be helpful to share what you are running and the expected output. Thanks!

I mean the the 3rd step, applications/DeepSpeed-Chat/training)/step3_rlhf_finetuning/. In particular, I run the multi-node/run_66b.sh using 2 nodes with 16 GPUs in total, and the model is bloomz-7b1, I can see the reward score in the standard output. But can I see the training process using tensorboard?

lekurile commented 1 year ago

Hi @Chevolier,

DeepSpeed has monitoring functionality built in and the monitor can be selected by specifying the corresponding configuration (TensorBoard, WandB, csv).

The documentation can be found here: https://www.deepspeed.ai/docs/config-json/#monitoring-module-tensorboard-wandb-csv

For TensorBoard, an example configuration may look like this:

"tensorboard": {
    "enabled": True,
    "output_path": "output/ds_logs/",
    "job_name": "train_bert"
}

The configuration can be added to the get_train_ds_config utility function found here: https://github.com/microsoft/DeepSpeedExamples/blob/dafeb2b3be3a085214faa2f59a8979c051424938/applications/DeepSpeed-Chat/training/utils/ds_utils.py#L32

Which will allow models that are initialized to have a monitor specified. Please let me know if you run into any problems with this method.

Thanks, Lev

lekurile commented 1 year ago

Hi @Chevolier,

Just wanted to update you that we have a PR to add various instrumentation across all the DS Chat steps, including tensorboard logging (GH-624)

Feel free to give it a try to see if it works on your end.

Thanks, Lev