mosaicml / llm-foundry

LLM training code for Databricks foundation models
https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Apache License 2.0
3.99k stars 525 forks source link

Log exception on inactivity callback #1194

Closed jjanezhang closed 4 months ago

jjanezhang commented 4 months ago

Log exception on inactivity callback

Logs an exception on timeout so we can write a run event.

Testing

Irene tested a run test-1b-5tv00z that timed out and I confirmed that we wrote the run timeout exception to metadata:

Screenshot 2024-05-10 at 2 24 15 PM