Issue: AttributeError when running tune run generate --config ./custom_quantization_generation_config.yaml

MaxwelsDonc commented 2 months ago

When running the command tune run generate ./custom_quantization_generation_config.yaml, I encountered the following error:
AttributeError: module 'torchtune.utils' has no attribute 'generate_next_token'.

I checked the source code on GitHub and confirmed that there is indeed no generate_next_token function. Additionally, in recipes/generate.py, the line from torchtune import config, training, utils also indicates that there is no training package or function.

Could you please explain why this might be happening?

Below is the specific configuration from custom_quantization_generation_config.yaml:

# Config for running the InferenceRecipe in generate.py to generate output from an LLM
#
# To launch, run the following command from root torchtune directory:
#    tune run generate --config generation

# Model arguments
model:
  _component_: torchtune.models.llama3.llama3_8b

checkpointer:
  _component_: torchtune.utils.FullModelTorchTuneCheckpointer
  checkpoint_dir: Llama3-Gen
  checkpoint_files: [
    meta_model_0-4w.pt
  ]
  output_dir: Llama3-Gen
  model_type: LLAMA3

device: cuda
dtype: bf16

seed: 1234

# Tokenizer arguments
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: Meta-Llama-3-8B/original/tokenizer.model

# Generation arguments; defaults taken from gpt-fast
prompt: "Tell me a joke?"
instruct_template: null
chat_format: null
max_new_tokens: 300
temperature: 0.6 # 0.8 and 0.6 are popular values to try
top_k: 300
# It is recommended to set enable_kv_cache=False for long-context models like Llama3.1
enable_kv_cache: True

quantizer:
  _component_: torchtune.utils.quantization.Int4WeightOnlyQuantizer
  groupsize: 256

SalmanMohammadi commented 2 months ago

Hey @MaxwelsDonc. I'm unable to reproduce the error as the command tune cp generation ./custom_quantization_generation_config.yaml is working for me. Could you ensure you have the latest version of torchtune installed?

felipemello1 commented 2 months ago

Hey @MaxwelsDonc , we did a major refactoring on the last few days, migrating many things from utils to training. However, I do see generate_next_token in utils: https://github.com/pytorch/torchtune/blob/f1fbe1ac6c9a1def639465f8c3b628b9fe5b9b4b/torchtune/utils/_generation.py#L37

can you please run "pip list" and share the torchtune version you are using, e.g. is is the stable, nightlies or source?

piljoong-jeong commented 2 months ago

@felipemello1 Hi, I met the same error while walking through this tutorial and I'm currently using stable torchtune==0.2.1 version.

I found that torchtune==0.1.1 works fine. Maybe refactoring has been applying in the stable channel since 0.2.0 release?

felipemello1 commented 2 months ago

I see. Ok, it makes sense. Thanks for the details and I am sorry that this is happening to you @piljoong-jeong @MaxwelsDonc . It seems that we need better testing on stable for our generation recipe + quantization.

TLDR:

I can see if i can make a patch to 0.2.1. Meanwhile, there are a five solutions:

Dont quantize. This only happens if quantize is true: https://github.com/pytorch/torchtune/blob/83557aa9ec1bda06f41524b5fdf09d5dac9a3829/recipes/generate.py#L148C23-L148C42

Install the nightly version (recommended):

pip install --pre torch torchvision torchao --index-url https://download.pytorch.org/whl/nightly/cu121
pip install --pre torchtune --extra-index-url https://download.pytorch.org/whl/nightly/cpu --no-cache-dir
pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cu121

Wait for next release (which should happen soon)
Change torchtune/utils/init and import generate_next_token there
Change the generate recipe to import from _generate from torchtune.utils._generate import generate_next_token

Explanation:

We have a folder structure like this:

torchtune
    |- utils
        |- __init__.py
        |- _generate.py

We import all of our public functions in init so users can just do: from torchtune.utils import X instead of from torchtune.utils._generate import X.

If you go to init in the main branch, you will see generate_next_token there.

However, if you change the branch to 0.2.1 release, you will see that we do not import generate_next_token from generate, and thats the issue

Importing "training" package

the line from torchtune import config, training, utils also indicates that there is no training package or function.

This one is a bit more odd to me. It sounds like you are using the recipe from one torchtune version with the code from another. Up to 0.2.1, we didnt import training. This only happens in main.

I tested it on my end to confirm. Can you please try to make sure that the recipe/torchtune version are aligned?

cc: @ebsmothers @joecummings

MaxwelsDonc commented 2 months ago

@felipemello1 Thanks for your reply. It’s possible that I was using a different version of Torchtune because after recreating the environment, the issue with the "training" package is now resolved. However, when I run the quantized model tune run generate --config ./custom_quantization_generation_config.yaml, I encountered a different problem:

RuntimeError: CUDA error: named symbol not found
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I'm not sure what is causing this error. I also tried generating tokens with the original model, and it works fine. Below are my environment details and the custom_quantization_generation_config.yaml file.

Property	Value
PyTorch Version	2.4.1+cu121
torchtune	0.2.1
CUDA Available	True
GPU Name	Tesla V100-SXM2-32GB
nvcc	NVIDIA (R) Cuda compiler driver
Copyright	(c) 2005-2023 NVIDIA Corporation
Built on	Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools	release 12.1, V12.1.66
Build	cuda_12.1.r12.1/compiler.32415258_0
NVIDIA-SMI	535.54.03
Driver Version	535.54.03
CUDA Version	12.2

# Config for running the InferenceRecipe in generate.py to generate output from an LLM
#
# To launch, run the following command from root torchtune directory:
#    tune run generate --config generation

# Model arguments
model:
  _component_: torchtune.models.llama3.llama3_8b

checkpointer:
  _component_: torchtune.utils.FullModelTorchTuneCheckpointer
  checkpoint_dir: Llama3-Gen
  checkpoint_files: [
    meta_model_0-4w.pt
  ]
  output_dir: Llama3-Gen
  model_type: LLAMA3

device: cuda
dtype: bf16

seed: 1234

# Tokenizer arguments
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: Meta-Llama-3-8B/original/tokenizer.model

# Generation arguments; defaults taken from gpt-fast
prompt: "Tell me a joke?"
instruct_template: null
chat_format: null
max_new_tokens: 300
temperature: 0.6 # 0.8 and 0.6 are popular values to try
top_k: 300
# It is recommended to set enable_kv_cache=False for long-context models like Llama3.1
enable_kv_cache: True

quantizer:
  _component_: torchtune.utils.quantization.Int4WeightOnlyQuantizer
  groupsize: 256

pytorch / torchtune