pytorch / torchchat

Run PyTorch LLMs locally on servers, desktop and mobile
BSD 3-Clause "New" or "Revised" License
3.4k stars 225 forks source link

Remove if statement preventing tps stats from being printed when running generate with compile #1330

Closed vmpuri closed 1 month ago

vmpuri commented 1 month ago

TPS and other stats were being reported as NAN due to poor logic.

I don't see any reason why compile should block theses stats from being printed or why there should be a separate TPS for jit compile

python3 torchchat.py generate llama3.2-1b --compile --device cuda
Using device=cuda NVIDIA PG509-210
Loading model...
Time to load model: 1.19 seconds
-----------------------------------------------------------
Hello, my name is Sophia. I'm a huge fan of your work. I've been following your blog for a while now and I just wanted to say that your content is top-notch. I love how you share your passion for history, science, and culture with your audience.

As a young woman in my early twenties, I'm always looking for new and interesting things to read about. Your blog is the perfect place to learn something new and expand my knowledge on a subject that I'm really interested in. I've read a few of your posts on archaeology and I was really impressed with what you had to say about it.

I was born and raised in a small town, and I've always been fascinated by the history of my hometown. I have a lot of amazing memories of visiting the local historical society and attending events and exhibitions. Your blog has made me realize how much I want to learn more about the history of my town and the region. I've been thinking about studying history as a career,just-in-time compilation time (incl run time): 8.3e+01 seconds
2024-10-24:17:19:37,884 INFO     [generate.py:1171] 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
Generated 199 tokens                 
Time for inference 1: 82.8278 sec total                 
Time to first token: 1.3995 sec with parallel prefill.                

      Total throughput: 2.4146 tokens/sec, 0.4141 s/token                 
First token throughput: 0.7145 tokens/sec, 1.3995 s/token                 
 Next token throughput: 2.4439 tokens/sec, 0.4092 s/token                     
2024-10-24:17:19:37,885 INFO     [generate.py:1182] 
Bandwidth achieved: 7.24 GB/s
2024-10-24:17:19:37,885 INFO     [generate.py:1186] *** This first iteration will include cold start effects for dynamic import, hardware caches, JIT compilation. ***

========================================

      Average tokens/sec (total): 2.41                 
Average tokens/sec (first token): 0.71                 
Average tokens/sec (next tokens): 2.44 

Memory used: 2.83 GB
pytorch-bot[bot] commented 1 month ago

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1330

Note: Links to docs will display an error until the docs builds have been completed.

:white_check_mark: No Failures

As of commit 52caa9c5387339d43d3d205c080a72647e53f2ce with merge base 7fe2c867cb02a115b91884655a2cbdd20dfe996a (image): :green_heart: Looks good so far! There are no failures yet. :green_heart:

This comment was automatically generated by Dr. CI and updates every 15 minutes.