huggingface / chat-ui

Open source codebase powering the HuggingChat app
https://huggingface.co/chat
Apache License 2.0
7.14k stars 1.03k forks source link

Phi-3 deployment issue #1139

Open gugarosa opened 3 months ago

gugarosa commented 3 months ago

Good afternoon everyone!

We know that Phi-3-mini-4k-instruct has been suffering from some gibberish outputs when used with HuggingChat and I think I have been finally able to track where the issue is coming from:

import requests
from transformers import AutoTokenizer

API_URL = "https://api-inference.huggingface.co/models/microsoft/Phi-3-mini-4k-instruct"
headers = {"Authorization": ""}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

messages=[
        {"role": "user", "content": "1234 + 5678"},
        {"role": "assistant", "content": " The sum of 1234 and 5678 is 6912.\n\nHere's the calculation:\n\n  1234\n+ 5678\n______\n  6912\n\nSo, 1234 + 5678 equals 6912. "},
        {"role": "user", "content": "1234 + 5678"},
        {"role": "assistant", "content": " To solve this more complex problem, we'll add the two numbers in a columnar format, taking into account any carryovers that occur during addition:\n\n  1 2 3 4\n+ 5 6 7 8\n________\n  6 9 1 2\n\nHere's the step-by-step breakdown:\n\n1. Start by adding the digits in the ones place (4 + 8 = 12). Write down the 2 and carry over the 1 to the tens place.\n2. Next, add the digits in the tens place, along with the carryover (3 + 7 + 1 = 11). Write down the 1 and carry over the 1 to the hundreds place.\n3. Add the digits in the hundreds place, along with the carryover (1 + 5 + 1 = 7).\n4. Finally, add the digits in the thousands place (2 + 6 = 8).\n\nThe final result is 6912. "},
        {"role": "user", "content": "1234 + 5678"},
        {"role": "assistant", "content": " The sum of 1234 and 5678 is 6912.\n\nHere's the step-by-step breakdown:\n\n1. Start by adding the digits in the ones place (4 + 8 = 12). Write down the 2 and carry over the 1 to the tens place.\n2. Next, add the digits in the tens place, along with the carryover (3 + 7 + 1 = 11). Write down the 1 and carry over the 1 to the hundreds place.\n3. Add the digits in the hundreds place, along with the carryover (1 + 5 + 1 = 7).\n4. Finally, add the digits in the thousands place (2 + 6 = 8).\n\nThe final result is 6912. "},
        {"role": "user", "content": "1234 + 5678"},
        {"role": "assistant", "content": " The sum of 1234 and 5678 is 6912.\n\nHere's the step-by-step breakdown:\n\n1. Start by adding the digits in the ones place (4 + 8 = 12). Write down the 2 and carry over the 1 to the tens place.\n2. Next, add the digits in the tens place, along with the carryover (3 + 7 + 1 = 11). Write down the 1 and carry over the 1 to the hundreds place.\n3. Add the digits in the hundreds place, along with the carryover (1 + 5 + 1 = 7).\n4. Finally, add the digits in the thousands place (2 + 6 = 8).\n\nThe final result is 6912. \n\nIn this problem, there is no need for additional constraints, so the solution remains straightforward. However, if we were to introduce constraints or additional layers of complexity, we could explore potential scenarios such as:\n\n1. Addition involving multi-digit numbers with different bases (e.g., binary or hexadecimal).\n2. Addition with the incorporation of mathematical operations like multiplication or division.\n3. Addition with specific conditions like carrying out the addition mentally or using a particular method (e.g., the lattice method).\n\nFor example, if we were to incorporate constraints like carrying out the addition using only mental math, we would need to break down the steps further and rely on estimation and rounding techniques. This would require a more in-depth explanation of the mental math strategies involved. "},
        {"role": "user", "content": "1234 + 5678"},
        {"role": "assistant", "content": " Let's introduce a constraint where we add the numbers using the lattice method. This method involves drawing a grid and performing multiplication and addition in a step-by-step manner. Here's how it would look for our numbers:\n\n\n```\n\n   1   2   3   4\n\n+ 5   6   7   8\n\n_____________\n\n```\n\n\n1. Draw a grid with two columns and four rows, leaving space for the carried values.\n\n2. Fill in the grid with the digits from the numbers, placing the units digit of each number in the top row of the corresponding column.\n\n3. Multiply each digit in the top row by each digit in the side column (similar to multiplication with partial products).\n\n4. Add the partial products, carrying over as necessary.\n\n5. Read off the sum from left to right, combining the digits along each column.\n\n\nFor our numbers, the lattice method would look like this:\n\n\n```\n\n      1   2   3   4\n\n  ×    5   6   7   8\n\n_____________\n\n       1   2   3   4\n\n  + 10   12   16   24\n\n  + 50  10  15  20\n\n  + 60  12  21  24\n\n  +300 60  21  24\n\n_____________\n\n     6   9   1   2\n\n```\n\n\n4. Add the numbers in each column, carrying over as needed:\n\n\n- The first column (units place) adds up to 2 (4 + 8), so write down 2 and carry over 1.\n\n- The second column (tens place) adds up to 1 (1 + 1 + 1 + 1 + 1), so write down 1 and carry over 1.\n\n- The third column (hundreds place) adds up to 1 (3 + 5 + 6 + 1), so write down 1.\n\n- The fourth column (thousands place) adds up to 6 (2 + 5), so write down 6.\n\n\nThe final result, using the lattice method, is 6912, which matches our initial straightforward addition. The lattice method provides a visual and systematic approach to addition, especially useful for larger numbers or when learning multiplication and carrying over concepts. "},
        {"role": "user", "content": "1234 + 5678"},
        {"role": "assistant", "content": " To enhance the complexity of the addition problem, we could introduce constraints that require a deeper understanding of arithmetic operations or the application of mathematical concepts. Here are some possible constraints that could be added to the simple addition problem:\n\n\n1. Extra Constraint: Addition with carryover in multiple places.\n\n   - We could add a number with more significant digits, such as 12345 + 56789, which would involve more carryovers and a larger grid in the lattice method.\n\n\n2. Extra Constraint: Addition involving decimals.\n\n   - We could modify the numbers to include decimals, such as 12.34 + 56.78, requiring students to align the decimal points and carry over during the addition process.\n\n\n3. Extra Constraint: Use of modular arithmetic.\n\n   - We could ask students to find the remainder when the sum is divided by a specific number, such as 1234 + 5678 mod 10, challenging them to apply modular arithmetic concepts.\n\n\n4. Extra Constraint: Addition with fractions.\n\n   - We could introduce fractions, such as 123/4 + 567/8, requiring students to find a common denominator, add the numerators, and simplify the result.\n\n\n5. Extra Constraint: Addition with algebraic expressions.\n\n   - We could replace the numbers with algebraic expressions, such as (3x + 2) + (5x + 7), adding the coefficients of like terms and simplifying the expression.\n\n\nFor example, adding decimals would involve the following steps:\n\n\n1. Align the numbers by the decimal point.\n\n2. Perform addition as usual, ensuring that the decimal points remain aligned.\n\n\nFor the numbers 12.34 + 56.78, the addition would look like this:\n\n\n```\n\n  12.34\n\n+ 56.78\n\n________\n\n```\n\n\nAdding the decimals:\n\n\n- Start from the rightmost digit (after the decimal point).\n\n- Add 4 (from 12.34) and 8 (from 56.78) to get 12. Write down 2 and carry over 1.\n\n- Add 3 (from 12.34), 7 (from 56.78), and the carried over 1 to get 11. Write down 1 and carry over another 1.\n\n- Add 2 (from 12.34), 6 (from 56.78), and the carried over 1 to get 9.\n\n- Since there are no more digits to the right of the decimals, simply bring down the decimal point.\n\n\nThe final result is 68.12.\n\n\nWhen introducing constraints, it is important to ensure that students are equipped with the necessary knowledge and skills to handle the added complexity. This may involve teaching them the specific concepts and operations required to perform the calculations accurately and efficiently. "},
        {"role": "user", "content": "1234 + 5678"},
        {"role": "assistant", "content": " To perform the addition with the given numbers, we can follow these steps:\n\n\n1. Align the numbers vertically, with 1234 on top, and 5678 below:\n\n\n```\n\n  1234\n\n+ 5678\n\n```\n\n\n2. Add the units place: 4 + 8 = 12. Write down 2 and carry over 1.\n\n3. Add the tens place: 3 + 7 + 6 + 5 = 1 + 1 (carry over 1\n\n4. Add the hundreds place: 1 + 5 = 678\n\n```\n\n```\n\nThe final result: 691238\n\n```\n\nThus, the sum = 6912.\n\n```\n\n```\n\n```\n\n1234. "},
        {"role": "user", "content": "1234 + 5678"},
    ]

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct", trust_remote_code=True)
inputs = tokenizer.apply_chat_template(messages, tokenize=False)

output = query({
    "inputs": inputs,
    "parameters": {
        "return_full_text": False,
    }
})
print(output)

If I run the Python request from above, you will see that some gibberish is generated, something like:

[{'generated_text': ' \n\n1234.\n\n1234.\n\n```\n\n1234.\n\n```\n\n1234.\n\n\n\n\n\n1234.\n\n\n1234.\n\n1234.\n\n1234.\n\n\n12341234\n\n1234\n\n1234\n\n1234\n\n12'}]

However, if I deploy a local instance of TGI, change the API_URL = "http://127.0.0.1:8080" and run the very same script, the generation starts to make sense:

[{'generated_text': 'To solve the addition problem, we can follow these steps:\n\n\n1. Write the numbers vertically, aligning the digits according to their place value:\n\n\n```\n\n  1234\n\n+ 5678\n\n```\n\n\n2. Start adding from the rightmost column (units place):\n\n\n- 4 (from 1234) + 8 (from 5678) = 1'}]

My suspicion is that the model that has been deployed to https://api-inference.huggingface.co/models/microsoft/Phi-3-mini-4k-instruct, which is consumed by the HuggingChat uses an older version of code/tokenizer configuration. It was added on the release day, and we did some updates after that day.

Another possibility could be an issue with a previous version of flash-attn (if it is being used) and somehow crashing regarding the sliding_window? I remember some older versions had a problem where the window was not being "accurately" computed.

Could you please re-deploy the model or take a look in it?

Thanks for your attention and best regards, Gustavo.

gugarosa commented 3 months ago

For reference, I am starting the TGI server with the following:

model=microsoft/Phi-3-mini-4k-instruct
volume=$PWD/data

docker run --gpus all \
    --shm-size 1g \
    -p 8080:80 \
    -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest \
    --model-id $model \
    --trust-remote-code
gugarosa commented 3 months ago

@nsarrazin could you please take a look into this?

nsarrazin commented 3 months ago

Hi! thanks for digging into this, will report it internally and come back to you!