sgl-project / sglang

SGLang is yet another fast serving framework for large language models and vision language models.
Apache License 2.0
2.87k stars 187 forks source link

Add SGLang usage examples #166

Open Ying1123 opened 5 months ago

Ying1123 commented 5 months ago

List some good use cases of SGLang here:

psych0v0yager commented 5 months ago

If I understand this paper correctly, we need to define a large list of SGL functions that represent a different "thinking technique (logical analysis, identification, plan and solve, chain of thought, tree of thought, etc).

Then we need the LLM to act as the executor who selects which function to call using sgl.select. The function is selected and the args are prompted afterwards

EX rough pseudocode


@sgl.function
def chain_of_thought(s, question):
     #Stuff goes here  

@sgl.function
def logical_analysis(s, question):
     #Stuff goes here 

# ....

# Map the function names to descriptions here
#{function_name: function_description}

# Map the function names to functions here
#{function_name: functions}

# List of functions
#[chain_of _thought,  logical_analysis, ...]

#List of names
#[chain_of _thought,  logical_analysis, ...]

s += system_prompt + question
s += sgl.select([list_of _function_names], name = "function")
s += '(' + sgl.gen(name='args', stop = ')')
s += map[s["function"]](s["args"])

I am not sure about implementing step 3 image

A while ago I implemented Chain of Thought using structured generation in guidance. I needed to use the following format Question: Step 1: Additional Turns: (Y or N) Step 2: Additional Turns: (Y or N) ...

The performance was underwhelming, but maybe something similar could be used to chain the adapted modules.

Unless its just done naively using a while loop.

I would like to try and implement this paper using sglang, lemme know your thoughts

Ying1123 commented 5 months ago

@psych0v0yager Awesome! I think your high-level understanding is correct.

I suggest trying open models since SGLang is currently mostly optimized for local models. Once you switch to open models, you can take full advantage of the constrained decoding features such as select and regex in sglang. By utilizing efficient batching and other runtime optimizations, you should experience a noticeable performance improvement when you switch the backend to sglang. Please keep us updated on your progress!

The OpenAI backend is still under development, so its functionality has not been fully tested. But please don't hesitate to submit issues if you happen to have some.

psych0v0yager commented 5 months ago

I have a naive implementation coded below. The code was influenced from this project (https://github.com/catid/self-discover) and has ways to go in additional improvements.

from sglang import function, system, user, assistant, gen, set_default_backend, RuntimeEndpoint, select

reasoning_modules = [
    "1. How could I devise an experiment to help solve that problem?",
    "2. Make a list of ideas for solving this problem, and apply them one by one to the problem to see if any progress can be made.",
    # "3. How could I measure progress on this problem?",
    "4. How can I simplify the problem so that it is easier to solve?",
    "5. What are the key assumptions underlying this problem?",
    "6. What are the potential risks and drawbacks of each solution?",
    "7. What are the alternative perspectives or viewpoints on this problem?",
    "8. What are the long-term implications of this problem and its solutions?",
    "9. How can I break down this problem into smaller, more manageable parts?",
    "10. Critical Thinking: This style involves analyzing the problem from different perspectives, questioning assumptions, and evaluating the evidence or information available. It focuses on logical reasoning, evidence-based decision-making, and identifying potential biases or flaws in thinking.",
    "11. Try creative thinking, generate innovative and out-of-the-box ideas to solve the problem. Explore unconventional solutions, thinking beyond traditional boundaries, and encouraging imagination and originality.",
    # "12. Seek input and collaboration from others to solve the problem. Emphasize teamwork, open communication, and leveraging the diverse perspectives and expertise of a group to come up with effective solutions.",
    "13. Use systems thinking: Consider the problem as part of a larger system and understanding the interconnectedness of various elements. Focuses on identifying the underlying causes, feedback loops, and interdependencies that influence the problem, and developing holistic solutions that address the system as a whole.",
    "14. Use Risk Analysis: Evaluate potential risks, uncertainties, and tradeoffs associated with different solutions or approaches to a problem. Emphasize assessing the potential consequences and likelihood of success or failure, and making informed decisions based on a balanced analysis of risks and benefits.",
    # "15. Use Reflective Thinking: Step back from the problem, take the time for introspection and self-reflection. Examine personal biases, assumptions, and mental models that may influence problem-solving, and being open to learning from past experiences to improve future approaches.",
    "16. What is the core issue or problem that needs to be addressed?",
    "17. What are the underlying causes or factors contributing to the problem?",
    "18. Are there any potential solutions or strategies that have been tried before? If yes, what were the outcomes and lessons learned?",
    "19. What are the potential obstacles or challenges that might arise in solving this problem?",
    "20. Are there any relevant data or information that can provide insights into the problem? If yes, what data sources are available, and how can they be analyzed?",
    "21. Are there any stakeholders or individuals who are directly affected by the problem? What are their perspectives and needs?",
    "22. What resources (financial, human, technological, etc.) are needed to tackle the problem effectively?",
    "23. How can progress or success in solving the problem be measured or evaluated?",
    "24. What indicators or metrics can be used?",
    "25. Is the problem a technical or practical one that requires a specific expertise or skill set? Or is it more of a conceptual or theoretical problem?",
    "26. Does the problem involve a physical constraint, such as limited resources, infrastructure, or space?",
    "27. Is the problem related to human behavior, such as a social, cultural, or psychological issue?",
    "28. Does the problem involve decision-making or planning, where choices need to be made under uncertainty or with competing objectives?",
    "29. Is the problem an analytical one that requires data analysis, modeling, or optimization techniques?",
    "30. Is the problem a design challenge that requires creative solutions and innovation?",
    "31. Does the problem require addressing systemic or structural issues rather than just individual instances?",
    "32. Is the problem time-sensitive or urgent, requiring immediate attention and action?",
    "33. What kinds of solution typically are produced for this kind of problem specification?",
    "34. Given the problem specification and the current best solution, have a guess about other possible solutions."
    "35. Let’s imagine the current best solution is totally wrong, what other ways are there to think about the problem specification?"
    "36. What is the best way to modify this current best solution, given what you know about these kinds of problem specification?"
    "37. Ignoring the current best solution, create an entirely new solution to the problem."
    # "38. Let’s think step by step."
]

@function
def self_discover(s, question, num_modules, reasoning_modules, max_turns=10):

    # Split the LLM into 3 parts
    selector, adaptor, implementor = s.fork(3)

    selector += f"Given the task: {question}, which of the following reasoning modules are relevant? Do not elaborate on why.\n"
    adaptor += f"Given the task: {question}, which of the following reasoning modules are relevant? Do not elaborate on why.\n"
    implementor += f"Without working out the full solution, create an actionable reasoning structure for the task using these adapted reasoning modules:\n"

    # Let the selector choose the reasoning modules
    selected_modules = []
    for module in range(num_modules):
        selector += "ASSISTANT:" + gen(choices = reasoning_modules, name=f"selection_{module + 1}") + " "
        selected_modules.append(selector[f"selection_{module + 1}"])

    # Let the adaptor modify the modules to fit the task
    adaptor += "USER: " + f"\nReasoning Modules: {selected_modules}\nOur task: {question}:"
    adapted_modules = []
    for module in range(len(selected_modules)):
        adaptor += "ASSISTANT: " + (f"Adapted Module {module + 1}:" + gen(stop=f"Adapted Module {module + 2}:", name = f"adapted_module{module + 1}"))
        adapted_modules.append(adaptor[f"adapted_module{module + 1}"])

    # Let the implementor implement the adapted modules
    implementor += "USER: " + (f"\nAdapted Modules:{adapted_modules}\nOur task: {question}") 
    implementor += "ASSISTANT: " + (gen(name="reasoning_structure", stop="</s>"))

    # Let the LLM finish the job
    reasoning_structure = implementor["reasoning_structure"]
    s += (f"Using the following reasoning structure: {reasoning_structure}")
    s += "USER: " + (f"Solve this task, providing your final answer: {question}")
    s += "ASSISTANT: " + (gen(name='process', stop="Final Answer:"))
    s += "ASSISTANT: " + ("Final Answer:" + gen(name="Final_Answer", stop="</s>"))

    return s

set_default_backend(RuntimeEndpoint("http://localhost:30000"))

state = self_discover.run(
    question="""This SVG path element <path d="M 55.57,80.69 L 57.38,65.80 M 57.38,65.80 L 48.90,57.46 M 48.90,57.46 L
45.58,47.78 M 45.58,47.78 L 53.25,36.07 L 66.29,48.90 L 78.69,61.09 L 55.57,80.69"/> draws a:
(A) circle (B) heptagon (C) hexagon (D) kite (E) line (F) octagon (G) pentagon(H) rectangle (I) sector (J) triangle""",
    num_modules = 3,
    reasoning_modules=reasoning_modules,

)

for m in state.messages():
    print(m["role"], ":", m["content"])

print(state["Final_Answer"])

I implemented the same code in guidance and it did work, though further optimizations taking advantage of the constrained generation would be beneficial

# Imports
import guidance
from guidance import image
from guidance import user, assistant, system
from guidance import gen, select
from guidance import capture, Tool, regex

# Paths
path_miqu = "path/goes/here"

# Models
llama2 = guidance.models.LlamaCpp(path_miqu, n_gpu_layers=-1, n_ctx=8192)

@guidance
def self_discover(llm, question, num_modules, reasoning_modules, max_turns=10):

    # Split the LLM into 3 parts
    selector = llm + f"<s> [INST] Given the task: {question}, which of the following reasoning modules are relevant? Do not elaborate on why. [/INST]"
    adaptor = llm + f"<s> [INST] Without working out the full solution, adapt the following reasoning modules to be specific to our task:"# \n Reasoning Modules: [/INST]" #{selected_modules}\n\nOur task:\n{task_example}
    implementor = llm + f"<s> [INST] Without working out the full solution, create an actionable reasoning structure for the task using these adapted reasoning modules:"# [/INST]"

    # Let the selector choose the reasoning modules
    selected_modules = []
    for module in range(num_modules):
        selector += select(reasoning_modules, name=f"selection_{module + 1}") + " "
        selected_modules.append(selector[f"selection_{module + 1}"])

    # Let the adaptor modify the modules to fit the task
    adaptor += f"\nReasoning Modules: {selected_modules}\nOur task: {question}: [/INST]" 
    adapted_modules = []
    for module in range(len(selected_modules)):
        adaptor += f"Adapted Module {module + 1}:" + gen(stop=f"Adapted Module {module + 2}:", name = f"adapted_module{module + 1}")
        adapted_modules.append(adaptor[f"adapted_module{module + 1}"])

    # Let the implementor implement the adapted modules
    implementor += f"\nAdapted Modules:{adapted_modules}\nOur task: {question} [/INST]" + gen(name="reasoning_structure", stop="</s>")

    # Let the LLM finish the job
    reasoning_structure = implementor["reasoning_structure"]
    llm += f"<s> [INST] Using the following reasoning structure: {reasoning_structure}\n\nSolve this task, providing your final answer: {question} [/INST]" + gen(name='process', stop="Final Answer:")
    llm += "Final Answer:" + gen(name="Final_Answer", stop="</s>")

    return llm

reasoning_modules = [
    "1. How could I devise an experiment to help solve that problem?",
    "2. Make a list of ideas for solving this problem, and apply them one by one to the problem to see if any progress can be made.",
    #"3. How could I measure progress on this problem?",
    "4. How can I simplify the problem so that it is easier to solve?",
    "5. What are the key assumptions underlying this problem?",
    "6. What are the potential risks and drawbacks of each solution?",
    "7. What are the alternative perspectives or viewpoints on this problem?",
    "8. What are the long-term implications of this problem and its solutions?",
    "9. How can I break down this problem into smaller, more manageable parts?",
    "10. Critical Thinking: This style involves analyzing the problem from different perspectives, questioning assumptions, and evaluating the evidence or information available. It focuses on logical reasoning, evidence-based decision-making, and identifying potential biases or flaws in thinking.",
    "11. Try creative thinking, generate innovative and out-of-the-box ideas to solve the problem. Explore unconventional solutions, thinking beyond traditional boundaries, and encouraging imagination and originality.",
    #"12. Seek input and collaboration from others to solve the problem. Emphasize teamwork, open communication, and leveraging the diverse perspectives and expertise of a group to come up with effective solutions.",
    "13. Use systems thinking: Consider the problem as part of a larger system and understanding the interconnectedness of various elements. Focuses on identifying the underlying causes, feedback loops, and interdependencies that influence the problem, and developing holistic solutions that address the system as a whole.",
    "14. Use Risk Analysis: Evaluate potential risks, uncertainties, and tradeoffs associated with different solutions or approaches to a problem. Emphasize assessing the potential consequences and likelihood of success or failure, and making informed decisions based on a balanced analysis of risks and benefits.",
    #"15. Use Reflective Thinking: Step back from the problem, take the time for introspection and self-reflection. Examine personal biases, assumptions, and mental models that may influence problem-solving, and being open to learning from past experiences to improve future approaches.",
    "16. What is the core issue or problem that needs to be addressed?",
    "17. What are the underlying causes or factors contributing to the problem?",
    "18. Are there any potential solutions or strategies that have been tried before? If yes, what were the outcomes and lessons learned?",
    "19. What are the potential obstacles or challenges that might arise in solving this problem?",
    "20. Are there any relevant data or information that can provide insights into the problem? If yes, what data sources are available, and how can they be analyzed?",
    "21. Are there any stakeholders or individuals who are directly affected by the problem? What are their perspectives and needs?",
    "22. What resources (financial, human, technological, etc.) are needed to tackle the problem effectively?",
    "23. How can progress or success in solving the problem be measured or evaluated?",
    "24. What indicators or metrics can be used?",
    "25. Is the problem a technical or practical one that requires a specific expertise or skill set? Or is it more of a conceptual or theoretical problem?",
    "26. Does the problem involve a physical constraint, such as limited resources, infrastructure, or space?",
    "27. Is the problem related to human behavior, such as a social, cultural, or psychological issue?",
    "28. Does the problem involve decision-making or planning, where choices need to be made under uncertainty or with competing objectives?",
    "29. Is the problem an analytical one that requires data analysis, modeling, or optimization techniques?",
    "30. Is the problem a design challenge that requires creative solutions and innovation?",
    "31. Does the problem require addressing systemic or structural issues rather than just individual instances?",
    "32. Is the problem time-sensitive or urgent, requiring immediate attention and action?",
    "33. What kinds of solution typically are produced for this kind of problem specification?",
    "34. Given the problem specification and the current best solution, have a guess about other possible solutions."
    "35. Let’s imagine the current best solution is totally wrong, what other ways are there to think about the problem specification?"
    "36. What is the best way to modify this current best solution, given what you know about these kinds of problem specification?"
    "37. Ignoring the current best solution, create an entirely new solution to the problem."
    #"38. Let’s think step by step."
    "39. Let’s make a step by step plan and implement it with good notation and explanation."
]

llm = llama2
llm += self_discover(question="Lisa has 10 apples. She gives 3 apples to her friend and then buys 5 more apples from the store. How many apples does Lisa have now?", num_modules=3, reasoning_modules=reasoning_modules)
psych0v0yager commented 5 months ago

Here is my server code

python -m sglang.launch_server --model-path /path/bagel_yi_34_dpo_AWQ --port 30000 --tp-size 2 --max-prefill-num-token 8192

I have noticed a few things.

  1. In instances like my self_discover function where s forks into 3 different threads, how can I view the content of those threads when calling the function?

  2. In both sglang and guidance I find LLMs tend to "resist" the grammers forced upon them. For example when given the following physics question "How can you calculate the gravitational force exerted by the moon on a 1kg object on the Earth's surface using Newton's law of universal gravitation? Please include the formula in your explanation and specify any assumptions made about the distances or masses involved." I have had LLMs answer correctly with free generation (gen) but fail when I attempt to constrain them to answer in a step by step format.

  3. Is there a way to reduce the loaded in context length? I would like to be able to load a Miqu gptq file with a trimmed context (8192). Furthermore how is context managed in sglang? If my tokens hit 8192 how can I trim the beginning user/assistant messages to make it fit

I think the solution for number 2 would either be fine tuning to the precise format of the sglang/guidance function or provide a more in depth few shot example to prime the LLM to answer in a more structured way.

If you have any other improvements, please let me know

lucasavila00 commented 3 months ago

image

LmScript is a typescript/javascript client for SGLang, and a graphical user interface for SGLang programs written in markdown.

Feedback highly appreciated.

timothylimyl commented 1 month ago

I found that the most fundamental part which is vanilla generation is not in the example.

Most users will use sglang for structured generation, but the endpoint is most probably used for many different kind of task which include vanilla generation (no structure).

merrymercy commented 1 week ago

@timothylimyl You can call this endpoint for valina generation https://github.com/sgl-project/sglang?tab=readme-ov-file#backend-sglang-runtime-srt You can also just use something like prompt + sgl.gen("output") for vanilla generation