SaeedBai commented 1 day ago

代码与Quick Start保持一致，唯一改动：将模型在本地部署在terminal运行python文件后response反馈“我已经理解了提供的dateframes信息，请问我可以帮您做些什么？” 我的部署环境受限只能本地部署，网上看到的教程运行同样代码可以根据问题作出正确回应

edwardzjl commented 1 day ago

你好，能否提供你运行的代码片段和输出？

SaeedBai commented 1 day ago

代码是在linux terminal直接跑的,输出结果如下：用户想要查看数据框的前五行。我们可以使用 df.head() 方法来实现这一点。

Python code:

print(df.head())

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, LlamaTokkenizerFast

Using pandas to read some structured data

import pandas as pd from io import StringIO import torch

single table

EXAMPLE_CSV_CONTENT = """ "Loss","Date","Score","Opponent","Record","Attendance" "Hampton (14–12)","September 25","8–7","Padres","67–84","31,193" "Speier (5–3)","September 26","3–1","Padres","67–85","30,711" "Elarton (4–9)","September 22","3–1","@ Expos","65–83","9,707" "Lundquist (0–1)","September 24","15–11","Padres","67–83","30,774" "Hampton (13–11)","September 6","9–5","Dodgers","61–78","31,407" """

csv_file = StringIO(EXAMPLE_CSV_CONTENT) df = pd.read_csv(csv_file)

model_path = "path/to/TableGPT2-7B"

model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype= torch.bfloat16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast = False)

example_prompt_template = """Given access to several pandas dataframes, write the Python code to answer the user's question.

/ "{var_name}.head(5).to_string(index=False)" as follows: {df_info} /

Question: {user_question} """ question = "哪些比赛的战绩达到了40胜40负？"

prompt = example_prompt_template.format( var_name="df", df_info=df.head(5).to_string(index=False), user_question=question, )

messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt}, ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=512) generated_ids = [ output_ids[len(input_ids) :] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response)

SaeedBai commented 1 day ago

顺便请问一下是否有微信群沟通问题，更方便些

edwardzjl commented 1 day ago

上面的代码部分与 tablegpt-agent 的用法似乎有些出入，请确保你参照了 https://tablegpt.github.io/tablegpt-agent/tutorials/chat-on-tabular-data/ 这篇文档。

zTaoplus commented 1 day ago

代码是在linux terminal直接跑的,输出结果如下：用户想要查看数据框的前五行。我们可以使用 df.head() 方法来实现这一点。

Python code:
print(df.head())
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, LlamaTokkenizerFast

Using pandas to read some structured data

import pandas as pd from io import StringIO import torch

single table

EXAMPLE_CSV_CONTENT = """ "Loss","Date","Score","Opponent","Record","Attendance" "Hampton (14–12)","September 25","8–7","Padres","67–84","31,193" "Speier (5–3)","September 26","3–1","Padres","67–85","30,711" "Elarton (4–9)","September 22","3–1","@ Expos","65–83","9,707" "Lundquist (0–1)","September 24","15–11","Padres","67–83","30,774" "Hampton (13–11)","September 6","9–5","Dodgers","61–78","31,407" """

csv_file = StringIO(EXAMPLE_CSV_CONTENT) df = pd.read_csv(csv_file)

model_path = "path/to/TableGPT2-7B"

model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype= torch.bfloat16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast = False)

example_prompt_template = """Given access to several pandas dataframes, write the Python code to answer the user's question.

/ "{var_name}.head(5).to_string(index=False)" as follows: {df_info} /

Question: {user_question} """ question = "哪些比赛的战绩达到了40胜40负？"

prompt = example_prompt_template.format( var_name="df", df_info=df.head(5).to_string(index=False), user_question=question, )

messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt}, ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=512) generated_ids = [ output_ids[len(input_ids) :] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response)

代码是在linux terminal直接跑的,输出结果如下：用户想要查看数据框的前五行。我们可以使用 df.head() 方法来实现这一点。

Python code:
print(df.head())
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, LlamaTokkenizerFast

Using pandas to read some structured data

import pandas as pd from io import StringIO import torch

single table

EXAMPLE_CSV_CONTENT = """ "Loss","Date","Score","Opponent","Record","Attendance" "Hampton (14–12)","September 25","8–7","Padres","67–84","31,193" "Speier (5–3)","September 26","3–1","Padres","67–85","30,711" "Elarton (4–9)","September 22","3–1","@ Expos","65–83","9,707" "Lundquist (0–1)","September 24","15–11","Padres","67–83","30,774" "Hampton (13–11)","September 6","9–5","Dodgers","61–78","31,407" """

csv_file = StringIO(EXAMPLE_CSV_CONTENT) df = pd.read_csv(csv_file)

model_path = "path/to/TableGPT2-7B"

model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype= torch.bfloat16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast = False)

example_prompt_template = """Given access to several pandas dataframes, write the Python code to answer the user's question.

/ "{var_name}.head(5).to_string(index=False)" as follows: {df_info} /

Question: {user_question} """ question = "哪些比赛的战绩达到了40胜40负？"

prompt = example_prompt_template.format( var_name="df", df_info=df.head(5).to_string(index=False), user_question=question, )

messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt}, ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=512) generated_ids = [ output_ids[len(input_ids) :] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response)

您好，请在您的运行环境中执行如下代码，然后请将输出贴到该Issue下，这样我们可以更好的尝试复现您的问题并给您作出解答：

点击展开

```python import platform import subprocess import sys def get_os_info(): return { "system": platform.system(), "node": platform.node(), "release": platform.release(), "version": platform.version(), "machine": platform.machine(), "processor": platform.processor(), } def get_python_info(): return { "implementation": platform.python_implementation(), "version": platform.python_version(), "compiler": platform.python_compiler(), } def get_pip_list(): result = subprocess.run( [sys.executable, "-m", "pip", "list"], capture_output=True, text=True, check=False, ) if result.returncode == 0: return result.stdout return f"Failed to get pip list: {result.stderr}" def write_to_log_file(content, filename="env_output.log"): with open(filename, "w") as file: file.write(content) def main(): os_info = get_os_info() python_info = get_python_info() pip_list = get_pip_list() content = "Operating System Information:\n" for key, value in os_info.items(): content += f"{key}: {value}\n" content += "\nPython Information:\n" for key, value in python_info.items(): content += f"{key}: {value}\n" content += "\nPip List:\n" content += pip_list # stdout print(content) # noqa: T201 # file write_to_log_file(content) if __name__ == "__main__": main() ```

zTaoplus commented 1 day ago

代码是在linux terminal直接跑的,输出结果如下：用户想要查看数据框的前五行。我们可以使用 df.head() 方法来实现这一点。

Python code:
print(df.head())
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, LlamaTokkenizerFast

Using pandas to read some structured data

import pandas as pd from io import StringIO import torch

single table

EXAMPLE_CSV_CONTENT = """ "Loss","Date","Score","Opponent","Record","Attendance" "Hampton (14–12)","September 25","8–7","Padres","67–84","31,193" "Speier (5–3)","September 26","3–1","Padres","67–85","30,711" "Elarton (4–9)","September 22","3–1","@ Expos","65–83","9,707" "Lundquist (0–1)","September 24","15–11","Padres","67–83","30,774" "Hampton (13–11)","September 6","9–5","Dodgers","61–78","31,407" """

csv_file = StringIO(EXAMPLE_CSV_CONTENT) df = pd.read_csv(csv_file)

model_path = "path/to/TableGPT2-7B"

model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype= torch.bfloat16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast = False)

example_prompt_template = """Given access to several pandas dataframes, write the Python code to answer the user's question.

/ "{var_name}.head(5).to_string(index=False)" as follows: {df_info} /

Question: {user_question} """ question = "哪些比赛的战绩达到了40胜40负？"

prompt = example_prompt_template.format( var_name="df", df_info=df.head(5).to_string(index=False), user_question=question, )

messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt}, ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=512) generated_ids = [ output_ids[len(input_ids) :] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response)

另外从您的代码中可以看出

model_path = "path/to/TableGPT2-7B" 和 model =AutoModelForCausalLM.from_pretrained(model_name, torch_dtype= torch.bfloat16, device_map="auto") 的变量含义不一致 (model_path != model_name)

那么很有可能是您的model_name变量在传入到AutoModelForCausalLM之前有非Tablegpt2-7B的其他赋值，导致了最后加载的其实不是我们需要的Tablegpt2-7B模型。

请您再次检查代码并尝试重新运行，观察response结果是否符合预期。

zTaoplus commented 1 day ago

我使用您的代码，在对齐了model_name和model_path的含义之后，在Tesla-V100-PCIE-32GB上的运行结果如下，从结果来看，是符合预期的

tablegpt / tablegpt-agent

使用huggingface里QuickStart代码运行，response回答"我已经理解了提供的dateframes信息，请问我可以帮您做些什么？" #127

Using pandas to read some structured data

single table

Using pandas to read some structured data

single table

Using pandas to read some structured data

single table

Using pandas to read some structured data

single table