Open MaxwelsDonc opened 2 months ago
Hey @MaxwelsDonc. I'm unable to reproduce the error as the command tune cp generation ./custom_quantization_generation_config.yaml
is working for me. Could you ensure you have the latest version of torchtune installed?
Hey @MaxwelsDonc , we did a major refactoring on the last few days, migrating many things from utils to training. However, I do see generate_next_token in utils: https://github.com/pytorch/torchtune/blob/f1fbe1ac6c9a1def639465f8c3b628b9fe5b9b4b/torchtune/utils/_generation.py#L37
can you please run "pip list" and share the torchtune version you are using, e.g. is is the stable, nightlies or source?
@felipemello1 Hi, I met the same error while walking through this tutorial and I'm currently using stable torchtune==0.2.1
version.
I found that torchtune==0.1.1
works fine. Maybe refactoring has been applying in the stable channel since 0.2.0
release?
I see. Ok, it makes sense. Thanks for the details and I am sorry that this is happening to you @piljoong-jeong @MaxwelsDonc . It seems that we need better testing on stable for our generation recipe + quantization.
I can see if i can make a patch to 0.2.1. Meanwhile, there are a five solutions:
pip install --pre torch torchvision torchao --index-url https://download.pytorch.org/whl/nightly/cu121
pip install --pre torchtune --extra-index-url https://download.pytorch.org/whl/nightly/cpu --no-cache-dir
pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cu121
from torchtune.utils._generate import generate_next_token
We have a folder structure like this:
torchtune
|- utils
|- __init__.py
|- _generate.py
We import all of our public functions in init so users can just do: from torchtune.utils import X
instead of from torchtune.utils._generate import X
.
If you go to init in the main branch, you will see generate_next_token there.
However, if you change the branch to 0.2.1 release, you will see that we do not import generate_next_token from generate, and thats the issue
the line from torchtune import config, training, utils also indicates that there is no training package or function.
This one is a bit more odd to me. It sounds like you are using the recipe from one torchtune version with the code from another. Up to 0.2.1, we didnt import training. This only happens in main.
I tested it on my end to confirm. Can you please try to make sure that the recipe/torchtune version are aligned?
cc: @ebsmothers @joecummings
@felipemello1
Thanks for your reply. It’s possible that I was using a different version of Torchtune because after recreating the environment, the issue with the "training" package is now resolved. However, when I run the quantized model tune run generate --config ./custom_quantization_generation_config.yaml
, I encountered a different problem:
RuntimeError: CUDA error: named symbol not found
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I'm not sure what is causing this error. I also tried generating tokens with the original model, and it works fine. Below are my environment details and the custom_quantization_generation_config.yaml file.
Property | Value |
---|---|
PyTorch Version | 2.4.1+cu121 |
torchtune | 0.2.1 |
CUDA Available | True |
GPU Name | Tesla V100-SXM2-32GB |
nvcc | NVIDIA (R) Cuda compiler driver |
Copyright | (c) 2005-2023 NVIDIA Corporation |
Built on | Tue_Feb__7_19:32:13_PST_2023 |
Cuda compilation tools | release 12.1, V12.1.66 |
Build | cuda_12.1.r12.1/compiler.32415258_0 |
NVIDIA-SMI | 535.54.03 |
Driver Version | 535.54.03 |
CUDA Version | 12.2 |
# Config for running the InferenceRecipe in generate.py to generate output from an LLM
#
# To launch, run the following command from root torchtune directory:
# tune run generate --config generation
# Model arguments
model:
_component_: torchtune.models.llama3.llama3_8b
checkpointer:
_component_: torchtune.utils.FullModelTorchTuneCheckpointer
checkpoint_dir: Llama3-Gen
checkpoint_files: [
meta_model_0-4w.pt
]
output_dir: Llama3-Gen
model_type: LLAMA3
device: cuda
dtype: bf16
seed: 1234
# Tokenizer arguments
tokenizer:
_component_: torchtune.models.llama3.llama3_tokenizer
path: Meta-Llama-3-8B/original/tokenizer.model
# Generation arguments; defaults taken from gpt-fast
prompt: "Tell me a joke?"
instruct_template: null
chat_format: null
max_new_tokens: 300
temperature: 0.6 # 0.8 and 0.6 are popular values to try
top_k: 300
# It is recommended to set enable_kv_cache=False for long-context models like Llama3.1
enable_kv_cache: True
quantizer:
_component_: torchtune.utils.quantization.Int4WeightOnlyQuantizer
groupsize: 256
When running the command
tune run generate ./custom_quantization_generation_config.yaml
, I encountered the following error:AttributeError: module 'torchtune.utils' has no attribute 'generate_next_token'
.I checked the source code on GitHub and confirmed that there is indeed no
generate_next_token
function. Additionally, inrecipes/generate.py
, the linefrom torchtune import config, training, utils
also indicates that there is notraining
package or function.Could you please explain why this might be happening?
Below is the specific configuration from
custom_quantization_generation_config.yaml
: