google / maxtext

A simple, performant and scalable Jax LLM!
Apache License 2.0
1.39k stars 247 forks source link

Integrate goodput monitor #749

Closed dipannita08 closed 1 week ago

dipannita08 commented 2 weeks ago

This changes adds the following:

Allows creating on a monitor object that spins up a secondary "monitor & upload" thread to query Goodput of the job using the ml-goodput-measurement pip package and and write a scalar metric to TB every interval period. Tested: