pytorch / torchchat

Run PyTorch LLMs locally on servers, desktop and mobile
BSD 3-Clause "New" or "Revised" License
3.34k stars 215 forks source link

[distributed] add stage metrics - total params per stage, total size and present it in a nicely formatted manner #1120

Closed lessw2020 closed 1 month ago

lessw2020 commented 1 month ago

This PR: 1 - computes the total params per stage and total byte size, and presents that in a nicely formatted manner:

Screenshot 2024-09-08 at 5 46 00 PM

There are four underlying functions supporting this - one each for getting the relevant data, and then one each for formatting it nicely into a nice human readable format.

The result is we can quickly monitor / verify that stages are relatively balanced in params / size.

pytorch-bot[bot] commented 1 month ago

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1120

Note: Links to docs will display an error until the docs builds have been completed.

:white_check_mark: No Failures

As of commit 88c594e5de84ebd82d429d7ba39b1627563fb6c1 with merge base 8b6aa07edf23af8610ffdee21f4603570f950619 (image): :green_heart: Looks good so far! There are no failures yet. :green_heart:

This comment was automatically generated by Dr. CI and updates every 15 minutes.