[BUG]Garbage GPTNeoX output when input text is long.

TopIdiot commented 1 year ago

GPTNeoX get wrong outputs when input text is long.

The output is similar with https://github.com/microsoft/DeepSpeed/issues/2293 , but only occured when input text is long.

Platform Nvidia A100 * 2

To Reproduce Steps to reproduce the behavior:

code:

from transformers import AutoConfig, AutoTokenizer, T5Tokenizer, pipeline
from transformers import GPTNeoXForCausalLM
import os
import torch
import time
import transformers
import deepspeed
from transformers import PreTrainedTokenizerFast

local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))

tokenizer_path="path_to_tokenzier/tokenizer.json"
model_path=path_to_model/20B'

tokenizer = PreTrainedTokenizerFast(
            tokenizer_file=tokenizer_path)
model = GPTNeoXForCausalLM.from_pretrained(model_path)

model = model.half()
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
generator.tokenizer.pad_token_id = 0

generator.model = deepspeed.init_inference(generator.model,
                                           mp_size=world_size,
                                           dtype=torch.float16,
                                           replace_method='auto',
                                           replace_with_kernel_inject=True)
generator.device = torch.device("cuda", local_rank)

text = """Human: 你是谁

    Assistant: 我是 Assistant，一个 AI 语言模型，我可以回答很多领域的问题。

    Human: 继续

    Assistant: 这段代码定义了一个名为 Person 的结构体，其中包含两个成员：名字和年龄。然后，它定义了一个名为的函数，用于将 Person 对象序列化到输出流中。

    接下来，你可以使用下面的代码来解析 JSON 字符串：

        #include <iostream>
        #include <string>
        #include <boost/archive/text_oarchive.hpp>
        #include <boost/archive/text_iarchive.hpp>
        #include <boost/serialization/base_object.hpp>
        #include <boost/serialization/string.hpp>
        #include <boost/archive/binary_oarchive.hpp>
        #include <boost/archive/binary_iarchive.hpp>

        std::string read_json(const std::string& json_string)
        {
            std::stringstream json_

    Human: 继续

    Assistant: 又写下去吗?

    Human: 是的
    Assistant: 好的，这个代码使用了 C++ 的标准库来解析 JSON。首先，它定义了一个名为 read_json 的函数，用于从一个指定的 JSON 字符串中读取数据。然后，它使用了来将 JSON 字符串转换为一个字符串流，并使用了 函数来读取 JSON 数据。

    最后，它使用了来将序列化后的数据写入一个二进制文件中。

    这只是一个简单的示例，实际应用中的代码可能会更加复杂。你可以根据你的需求进行修改和扩展。

    Human: 你是谁

    Assistant: 我是 Assistant，一个AI 语言模型，我可以回答很多领域的问题。

    Human: 1+1等于多少

    Assistant: 在 C++ 中，1+1 等于 2。

    Human: 私家花园的注意事项

    Assistant: 私家花园的注意事项如下：

      1. 确保花园的土壤适合植物生长，并经常浇水。
      2. 选择适合当地气候的植物，避免过于潮湿或过于干燥的环境。
      3. 定期修剪树木，保持树木的美观和整洁。
      4. 确保花园内没有任何有害物质，如杀虫剂或化肥。
      5. 定期清理花园，保持整洁。

    注意事项：上述内容仅供参考，具体的注意事项可能因地区、环境等因素而异。

    Human: 继续

    Assistant: 如果你是第一次养花，可以参考以下步骤：

      1. 确定你想要养什么类型的植物，是绿植还是花卉。
      2. 选择适合你花园的植物，并确保它们适合当地气候。
      3. 选择合适的土壤，并确保土壤适合植物生长。
      4. 如果你是第一次养植物，请仔细阅读植物的养护手册，以便更好地照顾植物。
      5. 定期浇水，保证植物有足够的水分。
      6. 定期施肥，以保证植物生长良好。
      7. 定

    Human: 继续

    Assistant: 续，请确保花园的环境安全，避免被动物或其他动物损坏。

    Human: 继续
    Assistant: 如果你是第一次养花，可以参考以下步骤：

      1. 确定你想要养什么类型的植物，是绿植还是花卉。
      2. 选择适合你花园的植物，并确保它们适合当地气候。
      3. 选择合适的土壤，并确保土壤适合植物生长。
      4. 如果你是第一次养植物，请仔细阅读植物的养护手册，以便更好地照顾植物。
      5. 定期浇水，保证植物有足够的水分。
      6. 定期施肥，以保证植物生长良好。
      7. 定

    Human: 继续

    Assistant: 续，请确保花园的环境安全，避免被动物或其他动物损坏。
    Human: 继续
    Assistant: 如果你是第一次养花，可以参考以下步骤：

      1. 确定你想要养什么类型的植物，是绿植还是花卉。
      2. 选择适合你花园的植物，并确保它们适合当地气候。
      3. 选择合适的土壤，并确保土壤适合植物生长。
      4. 如果你是第一次养植物，请仔细阅读植物的养护手册，以便更好地照顾植物。
      5. 定期浇水，保证植物有足够的水分。
      6. 定期施肥，以保证植物生长良好。
      7. 定

    Human: 继续

    Assistant:"""
print(generator([text], do_sample=False, num_beams=1, max_new_tokens=128, min_new_tokens=128))

Download GPTNeoX 20B checkpoint and tokenizer from EleutherAI/gpt-neox: https://github.com/EleutherAI/gpt-neox
Change tokenizer_path and model_path to your own path.
run this code with 2 GPU: deepspeed --num_gpus=2 this_code.py
The output is:

You can replace the text with another long text (perhaps more than 1100 tokens after tokenizered) and got "BytePtrFromString" always.

cmikeh2 commented 1 year ago

Hi @TopIdiot, I believe this is an artifact of the default argument for max_output_tokens being 1024 tokens. You may increase this in the init_inference call by passing a larger maximum value (see here for more details). Please let me know if this does not fix your issue!

TopIdiot commented 1 year ago

Hi @TopIdiot, I believe this is an artifact of the default argument for max_output_tokens being 1024 tokens. You may increase this in the init_inference call by passing a larger maximum value (see here for more details). Please let me know if this does not fix your issue!

Thanks a lot! I changed it to 2048 and solved the problem!

microsoft / DeepSpeed

[BUG]Garbage GPTNeoX output when input text is long. #2961