[Question] Speculative Decoding Mode

bethalianovike commented 3 months ago

❓ General Questions

How do I get the eagle and medusa mode of the LLM model? I try to do the "convert_weight", "gen_config", and "compile" steps of MLC-LLM with the addition --model-type "eagle" or "medusa" on the command line. But when executing the convert weight step, it gives an error message. Could someone please give me some tips on how to do the speculative decoding on MLC-LLM? Thank you in advance!

For "eagle" mode:

File "/home/mlc-llm/python/mlc_llm/interface/convert_weight.py", line 122, in _param_generator
    loader = LOADER[args.source_format](
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mlc-llm/python/mlc_llm/loader/huggingface_loader.py", line 99, in __init__
    check_parameter_usage(extern_param_map, set(self.torch_to_path.keys()))
  File "/home/mlc-llm/python/mlc_llm/loader/utils.py", line 33, in check_parameter_usage
    raise ValueError(

For "medusa" mode:

  File "/home/mlc-llm/python/mlc_llm/interface/convert_weight.py", line 59, in _convert_args
    model_config = args.model.config.from_file(args.config)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mlc-llm/python/mlc_llm/support/config.py", line 71, in from_file
    return cls.from_dict(json.load(in_file))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mlc-llm/python/mlc_llm/support/config.py", line 51, in from_dict
    return cls(**fields, kwargs=kwargs)  # type: ignore[call-arg]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: MedusaConfig.__init__() missing 2 required positional arguments: 'medusa_num_heads' and 'medusa_num_layers'

sunzj commented 3 months ago

As for eagle, seems you need model file: https://huggingface.co/yuhuili/EAGLE-llama2-chat-7B/tree/main then run something like:

mlc_llm convert_weight ./dist/EAGLE-llama2-chat-7B --quantization q4f16_1 -o dist/EAGLE-llama2-chat-7B-q4f16 --model-type "eagle"
mlc_llm gen_config ./dist/EAGLE-llama2-chat-7B --quantization q4f16_1 -o dist/EAGLE-llama2-chat-7B-q4f16 --model-type eagle --conv-template llama-2
mlc_llm compile ./dist/EAGLE-llama2-chat-7B-q4f16/mlc-chat-config.json --device opencl -o dist/libs/EAGLE-llama2-chat-7B-q4f16.so

however, after generate the library and model, i try the following cmd, no respond for request. if don't use speculative mode, it's just fine.

mlc_llm serve dist/Llama-2-7b-chat-hf-q4f16_1/params --model-lib dist/libs/Llama-2-7b-chat-hf-q4f16_1.so --mode local --additional-models dist/EAGLE-llama2-chat-7B-q4f16,dist/libs/EAGLE-llama2-chat-7B-q4f16.so --speculative-mode eagle

bethalianovike commented 3 months ago

Thank you @sunzj! Yes! I am also stuck on that step... Have you already tried to run mlc_llm chat on that EAGLE-llama2-chat-7B? When I try, it gives me a tokenizer error message, do we need to copy the tokenizer file from llama2-7b? (since https://huggingface.co/yuhuili/EAGLE-llama2-chat-7B/tree/main doesn't have tokenizer files on it)

Error message when running mlc_llm chat:

[2024-08-01 09:15:47] INFO engine_base.py:143: Using library model: ../dist/libs/EAGLE-llama2-chat-7B-q4f16_1-MLC_SLM_gpu_1_cuda.so
[09:15:47] /home/mlc-llm/cpp/tokenizers/tokenizers.cc:202: Warning: Tokenizer info is not detected as tokenizer.json is not found. The default tokenizer info will be used.
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/mlc-llm/python/mlc_llm/__main__.py", line 64, in <module>
    main()
  File "/home/mlc-llm/python/mlc_llm/__main__.py", line 45, in main
    cli.main(sys.argv[2:])
  File "/home/mlc-llm/python/mlc_llm/cli/chat.py", line 36, in main
    chat(
  File "/home/mlc-llm/python/mlc_llm/interface/chat.py", line 282, in chat
    JSONFFIEngine(
  File "/home/mlc-llm/python/mlc_llm/json_ffi/engine.py", line 255, in __init__
    self.tokenizer = Tokenizer(model_args[0][0])
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mlc-llm/python/mlc_llm/tokenizers/tokenizers.py", line 64, in __init__
    self.__init_handle_by_constructor__(
  File "/home/mlc-llm/3rdparty/tvm/python/tvm/_ffi/_ctypes/object.py", line 145, in __init_handle_by_constructor__
    handle = __init_by_constructor__(fconstructor, args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mlc-llm/3rdparty/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 262, in __init_handle_by_constructor__
    raise_last_ffi_error()
  File "/home/mlc-llm/3rdparty/tvm/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "/home/mlc-llm/cpp/tokenizers/tokenizers.cc", line 459, in operator()
    return Tokenizer::FromPath(path);
                    ^^^^^^^^^^^^^^^^^^
  File "/home/mlc-llm/cpp/tokenizers/tokenizers.cc", line 191, in mlc::llm::Tokenizer::FromPath(tvm::runtime::String const&, std::optional<mlc::llm::TokenizerInfo>)
    LOG(FATAL) << "Cannot find any tokenizer under: " << _path;
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
tvm._ffi.base.TVMError: Traceback (most recent call last):
  1: operator()
        at /home/mlc-llm/cpp/tokenizers/tokenizers.cc:459
  0: mlc::llm::Tokenizer::FromPath(tvm::runtime::String const&, std::optional<mlc::llm::TokenizerInfo>)
        at /home/mlc-llm/cpp/tokenizers/tokenizers.cc:191
  File "/home/mlc-llm/cpp/tokenizers/tokenizers.cc", line 191
TVMError: Cannot find any tokenizer under: ../dist/EAGLE-llama2-chat-7B-q4f16_1-MLC_SLM_gpu_1

sunzj commented 3 months ago

@bethalianovike i don't try run mlc_llm chat, as i know, the eagle model is a forward layer, can't use for chat.

bethalianovike commented 3 months ago

@sunzj Got it! Thanks! Actually, when I look at the github repo, there is another code that we can use to run the speculative decoding, https://github.com/mlc-ai/mlc-llm/blob/main/tests/python/serve/test_serve_engine_spec.py It includes the code that we can use to run testing for small_draft and eagle. I succeed running the small_draft method use that code.

sunzj commented 3 months ago

@bethalianovike hey, try change mode to server, --mode server such as: mlc_llm serve dist/Llama-2-7b-chat-hf-q4f16_1/params --model-lib dist/libs/Llama-2-7b-chat-hf-q4f16_1.so --mode server --additional-models dist/EAGLE-llama2-chat-7B-q4f16,dist/libs/EAGLE-llama2-chat-7B-q4f16.so --speculative-mode eagle

the server mode isn't the root cause, speculative mode needs max_num_sequencelarger than spec_draft_length+1, the default spec_draft_length is 4, so the min max_num_sequence should be 6. try mlc_llm serve dist/Llama-2-7b-chat-hf-q4f16_1/params --model-lib dist/libs/Llama-2-7b-chat-hf-q4f16_1.so --mode server --additional-models dist/EAGLE-llama2-chat-7B-q4f16,dist/libs/EAGLE-llama2-chat-7B-q4f16.so --speculative-mode eagle --device opencl --overrides max_num_sequence=6

bethalianovike commented 3 months ago

@sunzj It can run perfectly! Thanks! Can we get the decode time or decode rate for this speculative decoding result?

sunzj commented 3 months ago

@bethalianovike try curl http://127.0.0.1:8000/metrics how about your result, as i test, eagle seems not imporve the decoding result, however, it could cause by my device. Since i don't use NV GPU, need tuning.

MrRace commented 3 months ago

@sunzj does the Speculative Decoding Mode can been used in Android ?

bethalianovike commented 3 months ago

@sunzj Yes, that's works, thanks! In my device (NVIDIA GeForce RTX 4090), the decode time seems to improve, but the answer without speculative decoding does not match speculative decoding... Is your result also like this?

Here's my decoding result with the main model Llama-2-7b-chat-hf-q0f16:

Without speculative decoding Decode time: 12.11516752 s

"Of course, I'd be happy to help! Alaska is a beautiful and unique state located in the northwest corner of North America, known for its vast wilderness, diverse wildlife, and rich cultural heritage. Here are some of the things that Alaska is famous for:\n1. Unspoiled Nature and Wildlife: Alaska is home to some of the most pristine and untouched natural landscapes in the world. The state is home to numerous national parks, forests, and wildlife refuges, including Denali National Park and Preserve, Wrangell-St. Elias National Park and Preserve, and the Arctic National Wildlife Refuge. These areas are teeming with wildlife, including bears, moose, caribou, wolves, and Dall sheep.\n2. Salmon Fishing: Alaska is famous for its world-class salmon fishing. The state's rivers and streams are home to several species of salmon, including king salmon, silver salmon, and red salmon. The Kenai River, located in southern Alaska, is particularly renowned for its trophy-sized king salmon fishing.\n3. Dog Sledding and Dog Mushing: Alaska is the birthplace of dog sledding, and the state is home to numerous dog mushing kennels and tour operators. Visitors can experience the thrill of dog sledding through the state's snowy landscapes, often accompanied by a team of happy and energetic sled dogs.\n4. Northern Lights: Alaska is one of the best places in the world to see the Northern Lights, also known as the aurora borealis. The state's northern location and lack of light pollution make it the perfect spot to witness this natural phenomenon, which is caused by charged particles from the sun interacting with the Earth's magnetic field.\n5. Gold Mining: Alaska has a rich history of gold mining, with many active mines throughout the state. The Klondike Gold Rush of the late 1800s brought thousands of prospectors to Alaska, and the state continues to be a major producer of gold today.\n6. Native American Culture: Alaska is home to 22 federally recognized Native American tribes, each with their own unique culture, language, and traditions. Visitors can learn about the state's indigenous cultures through cultural festivals, museums, and tribal villages.\n7. Cruise Ship Tours: Alaska is a popular destination for cruise ships, with many ships traveling through the state's Inside Passage. Visitors can enjoy stunning scenery, wildlife viewing, and port stops in popular cities like Juneau, Ketchikan, and Skagway.\n8. Skiing and Snowboarding: Alaska has a number of world-class ski resorts, including Alyeska Resort in Girdwood and Eaglecrest Ski Area in Juneau. The state's rugged terrain and abundant snowfall make it a paradise for skiers and snowboarders.\n9. Ice Climbing: Alaska is home to some of the most challenging and scenic ice climbing routes in the world. The state's glaciers and ice formations offer a variety of climbing options for experienced climbers.\n10. Alaska Native Crafts: Alaska is known for its traditional Native American crafts, including basket weaving, woodcarving, and jewelry making. Visitors can purchase handmade crafts from local artisans and support the state's indigenous communities.\nThese are just a few of the many things that Alaska is famous for. Whether you're interested in nature, culture, or adventure, Alaska has something to offer."

With eagle (Llama2-chat-7B-q4f16_1) Decode time: 6.581710278 s

"Of course! Alaska is a beautiful and unique state located in the northwest corner of North America, known for its vast wilderness, diverse wildlife, and rich cultural heritage. Here are some of the things that Alaska is famous for:\n1. Unspoiled Wilderness: Alaska is home to some of the most pristine and untouched wilderness areas in the United States. The state has more than 100,000 miles of rivers, 3,000 miles of coastline, and numerous national parks, forests, and wildlife refuges.\n2. Wildlife: Alaska is a wildlife enthusiast's paradise, with an incredible variety of animals that can be found nowhere else in the world. Visitors can spot bears, moose, caribou, wolves, Dall sheep, and eagles, among many other species.\n3. Northern Lights: Alaska is one of the best places in the world to see the Northern Lights, also known as the aurora borealis. The displays of colored light can be seen in the night sky from September to April, and are often accompanied by a feeling of cold and crisp air.\n4. Dog Sledding: Alaska is home to the Iditarod Trail Sled Dog Race, one of the most famous dog sledding races in the world. Visitors can experience the thrill of dog sledding firsthand by taking a tour with a local musher.\n5. Fishing: Alaska is famous for its world-class fishing, with numerous rivers, streams, and lakes filled with salmon, halibut, and other species. Visitors can try their hand at fishing in the state's many fishing lodges or on a guided fishing trip.\n6. Native Culture: Alaska has a rich Native American culture, with 22 federally recognized tribes. Visitors can learn about the traditions and customs of the state's indigenous peoples by visiting cultural centers, attending festivals, or taking a guided tour.\n7. Gold Mining: Alaska has a long history of gold mining, with many active mines throughout the state. Visitors can take a tour of a gold mine and learn about the process of extracting gold from the earth.\n8. Glaciers: Alaska is home to many of the most impressive glaciers in the world, including the Mendenhall Glacier in Juneau and the Exit Glacier in Seward. Visitors can take a guided glacier hike or boat tour to get up close and personal with these natural wonders.\n9. Northern Exploration: Alaska has a long history of exploration and adventure, with many famous explorers including Vitus Bering, James Cook, and Roald Amundsen all visiting the state. Visitors can learn about the state's rich exploration history by visiting museums and historical sites.\n10. Unique Landscapes: Alaska is home to some of the most unique and breathtaking landscapes in the world, including the Arctic tundra, the Brooks Range, and the Aleutian Islands. Visitors can take a scenic flightseeing tour or a boat tour to experience the state's stunning landscapes from above or below.\nIn conclusion, Alaska is famous for its unspoiled wilderness, diverse wildlife, and rich cultural heritage. From dog sledding to fishing, glaciers to Native culture, there are countless ways to experience the beauty and adventure of Alaska. Whether you're an outdoor enthusiast, a history buff, or simply looking for a unique travel experience, Alaska has something for everyone."

sunzj commented 3 months ago

@MrRace i am not sure the android can set to speculative mode. as i verify, local mode can support speculative , just set max_num_sequence larger than 6. such as: mlc_llm serve dist/Llama-2-7b-chat-hf-q4f16_1/params --model-lib dist/libs/Llama-2-7b-chat-hf-q4f16_1.so --mode server --additional-models dist/EAGLE-llama2-chat-7B-q4f16,dist/libs/EAGLE-llama2-chat-7B-q4f16.so --speculative-mode eagle --device opencl --overrides max_num_sequence=6

sunzj commented 3 months ago

@bethalianovike it may not cause by speculative decoding, the LLM won't output the same result even if you provide the same prompt. Check the parameter : Temperature. https://www.iguazio.com/glossary/llm-temperature/

mlc-ai / mlc-llm

[Question] Speculative Decoding Mode #2710

❓ General Questions