pytorch / torchchat

Run PyTorch LLMs locally on servers, desktop and mobile
BSD 3-Clause "New" or "Revised" License
3.13k stars 196 forks source link

[distributed] integrate chat tokenizers, and add llama3-8B model option #1110

Closed lessw2020 closed 2 weeks ago

lessw2020 commented 2 weeks ago

This PR: 1 - integrates the chat Tokenizers by using the TokenizerArgs and _inititialize_tokenizer functions from builder.py. With the _build_chat_tokenizer() you can instantiate the same tokenizers as installed by chat (rather than using HF tokenizer).

example:

[rank0]:2024-09-05:15:57:26,835 INFO     [dist_run.py:83] using tokenizer = tokenizer.tiktoken.Tokenizer

and

[rank0]:2024-09-05:16:01:01,716 INFO     [dist_run.py:83] using tokenizer = sentencepiece.SentencePieceProcessor

2 - adds llama3-8B instructional as a valid model for dist.

pytorch-bot[bot] commented 2 weeks ago

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1110

Note: Links to docs will display an error until the docs builds have been completed.

:white_check_mark: No Failures

As of commit 12d5a56f2242f90ad99e6242373015a290048f3d with merge base d58923e85de3fa84b05239f23056de913cd76b76 (image): :green_heart: Looks good so far! There are no failures yet. :green_heart:

This comment was automatically generated by Dr. CI and updates every 15 minutes.