langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
50.5k stars 7.24k forks source link

The Markdown multi-table rendering in the chat interface often gets misaligned or fails to render #7974

Closed GeorgeGalway closed 1 month ago

GeorgeGalway commented 2 months ago

Self Checks

Dify version

0.6.7~0.7.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

image

✔️ Expected Behavior

To render correctly and differentiate based on tables.

❌ Actual Behavior

Incorrect rendering or failure to render properly.

dosubot[bot] commented 2 months ago

@GeorgeGalway Hello there! I'm here to help you with any bugs, questions, or contributions you may have for the repository. Let me know how I can assist you.

I found a similar closed issue that might be relevant to your problem: Deepseek model Reply Duplicate. The suggested solution for issues in Dify v0.6.7 includes providing detailed information such as expected behavior, actual behavior, steps to reproduce, error messages or logs, and screenshots or screen recordings.

Regarding the Markdown rendering engine, it handles multi-table structures by defining specific states and tokens for table headers and table bodies. It uses a tokenizer to identify and process different parts of the table structure. Here is a relevant part of the code:

tokenizer: {
    root: [
        [/^\s*\|/, "@rematch", "@table_header"],
        // other rules...
    ],
    table_header: [
        { include: "@table_common" },
        [/[^\|]+/, "keyword.table.header"]
    ],
    table_body: [
        { include: "@table_common" },
        { include: "@linecontent" }
    ],
    table_common: [
        [/\s*[\-:]+\s*/, { token: "keyword", switchTo: "table_body" }],
        [/^\s*\|/, "keyword.table.left"],
        [/^\s*[^\|]/, "@rematch", "@pop"],
        [/^\s*$/, "@rematch", "@pop"],
        [/\|/, { cases: { "@eos": "keyword.table.right", "@default": "keyword.table.middle" } }]
    ],
    // other states...
}

This code defines how the tokenizer should handle the beginning of a table (@table_header), the common elements within a table (@table_common), and the body of the table (@table_body). The tokenizer uses regular expressions to match table syntax and assigns appropriate tokens to different parts of the table [1].

Additionally, specific CSS styles are applied to tables within the Markdown rendering process to ensure proper spacing, padding, and border styles. Here are the relevant styles:

.markdown-body table {
  border-spacing: 0;
  border-collapse: collapse;
  display: block;
  width: max-content;
  max-width: 100%;
  overflow: auto;
}

.markdown-body table th {
  font-weight: var(--base-text-weight-semibold, 600);
  white-space: nowrap;
}

.markdown-body table th,
.markdown-body table td {
  padding: 6px 13px;
  border: 1px solid var(--color-border-default);
}

.markdown-body table tr {
  background-color: var(--color-canvas-default);
  border-top: 1px solid var(--color-border-muted);
}

.markdown-body table tr:nth-child(2n) {
  background-color: var(--color-canvas-subtle);
}

.markdown-body table img {
  background-color: transparent;
}

These styles ensure that tables are displayed correctly with proper alignment and readability [2].

If you continue to experience issues, providing detailed information as suggested in the similar issue might help in diagnosing and resolving the problem.

To continue talking to Dosu, mention @dosu.

crazywoola commented 2 months ago

Can you provide the texts?

GeorgeGalway commented 1 month ago

Can you provide the texts?

{ "text": "### 1.1.1 页面描述:\n\n页面名称:用户信息配置页面\n\n页面用途:用于展示和管理系统中的用户详细信息,包括查看、编辑和筛选用户数据。\n\n页面入口:主菜单 > 用户管理 > 用户信息配置\n\n初始页面:默认展示所有用户数据,按注册日期降序排列。\n\n分页机制:每页显示20条记录,支持跳转到指定页码。\n\n样式布局:金额字段(如账户余额)需有千位符号,并保留两位小数右对齐。其他字段根据内容自适应宽度。\n\n### 1.1.2 列表字段:\n\n| 字段表头名称 | 字段解释 | 字段样例 |\n|------------------------|------------------------|-------------------------|\n| 用户ID | 系统生成的唯一标识 | 123456 |\n| 用户名 | 用户的登录名 | johndoe |\n| 密码哈希 | 加密后的密码 | $2a$10$... |\n| 电子邮件 | 用户的邮箱地址 | johndoe@example.com |\n| 联系电话 | 用户的联系电话 | +8613800138000 |\n| 注册日期 | 用户注册时间 | 2023-01-01 |\n| 最近登录时间 | 上次登录时间 | 2023-10-01 12:00 |\n| 用户状态 | 当前状态(启用/禁用) | 启用 |\n| 用户角色 | 分配的角色权限 | 管理员 |\n| 证件类型 | 身份证、护照等类型 | 身份证 |\n| 证件号码 | 对应证件号码 | 110101199003076512 |\n| 开户银行 | 银行名称 | 中国银行 |\n| 银行卡号 | 银行账号 |- |- |- |- |- |- |- |- |- |- |- |- |- |\n\n### 1.1.3 筛选和排序:\n\n筛选条件:\n\n| 筛选项 \t\t| 默认值 \t\t| 筛选框类型 \t\t| 输入限制 \t\t|-关联约束 \t\t\t|\n|-|-|-|-|-|\n注册日期 \t\t\t空 \t\t日期选择器 \t\t不超过当前日期 \t 无 \n用户状态 \t\t 全部 \t 下拉选择框 \t 启用/禁用 无 \n用户角色 \t\t 全部 \t 多选下拉框 - - \n性别 \t\t 全部 \t 单选按钮 男/女/其他 无 \n手机号码认证状态 \t 全部 \t 下拉选择框 \t 已认证/未认证 无 \n\n默认排序规则:按注册日期降序排列。\n\n### 1.1.4 交互和操作:\n\n描述用户可以对列表数据进行哪些操作,如选择、删除、编辑等。\n\n#### 编辑用户信息:\n\n允许管理员编辑用户基本信息,如用户名、电子邮件等。\n\n表单字段说明:\n\n- 用户名: 文本输入框, 默认值: johndoe, 输入限制: 50字符以内, 是否必填: 是\n- 电子邮件: 文本输入框, 默认值: johndoe@example.com, 输入限制: 邮箱格式验证, 是否必填: 是\n- 联系电话: 文本输入框, 默认值: +8613800138000, 输入限制: 电话号码格式验证, 是否必填: 否\n\n验证规则:\n- 邮箱格式必须符合标准邮箱格式。\n- 联系电话需符合国际电话号码格式。\n\n操作异常:\n错误消息以红色字体显示在对应字段下方,例如“请输入有效的邮箱地址”。\n\n#### 删除用户:\n\n允许管理员删除不再需要的用户记录。删除前需确认操作,以防误删。\n\n### 1.1.5 性能要求:\n\n在处理大量数据时,列表加载时间不超过3秒,支持快速分页切换与筛选结果实时更新。", "usage": { "prompt_tokens": 1402, "prompt_unit_price": "2.50", "prompt_price_unit": "0.000001", "prompt_price": "0.0035050", "completion_tokens": 863, "completion_unit_price": "10.00", "completion_price_unit": "0.000001", "completion_price": "0.0086300", "total_tokens": 2265, "total_price": "0.0121350", "currency": "USD", "latency": 13.259290382266045 }, "finish_reason": "stop" }

image THX

GeorgeGalway commented 1 month ago

Can you provide the texts?

Could you help me? This bug has already affected the output of our large documents, and for now, we can only temporarily replace it with FastGPT

GeorgeGalway commented 1 month ago

Can you provide the texts?

I tested it and found that this issue is quite severe in the GPT-4o model. It's slightly better in GPT-4o-0806, but there are still occurrences of serializing issues.

2019YKL commented 1 month ago

I'm having the same problem, I'm sure there's nothing wrong with my format.

crazywoola commented 1 month ago

I'm having the same problem, I'm sure there's nothing wrong with my format.

Can you provide the texts as well?

2019YKL commented 1 month ago

I'm having the same problem, I'm sure there's nothing wrong with my format.

Can you provide the texts as well?

| GPT-4o | Llama-70b | DeepSeek |
|-----|-------|---------|
|{{#llm.text#}}|{{#17261508557390.text#}}|{{#17261524414940.text#}}|
iamjoel commented 1 month ago

I'm having the same problem, I'm sure there's nothing wrong with my format.

Can you provide the texts as well?

| GPT-4o | Llama-70b | DeepSeek |
|-----|-------|---------|
|{{#llm.text#}}|{{#17261508557390.text#}}|{{#17261524414940.text#}}|

I can't reproduce.

image
GeorgeGalway commented 1 month ago

I'm having the same problem, I'm sure there's nothing wrong with my format.我遇到了同样的问题,我确定我的格式没有问题。

Can you provide the texts as well?您能提供文本吗?

| GPT-4o | Llama-70b | DeepSeek |
|-----|-------|---------|
|{{#llm.text#}}|{{#17261508557390.text#}}|{{#17261524414940.text#}}|

I can't reproduce. 我无法复制。 image

Please use the GPT4O model and generate tables with more rows. It would be preferable if a single response includes multiple table outputs, as this will likely increase the chances of encountering issues.

iamjoel commented 1 month ago

Can you provide the texts?

{ "text": "### 1.1.1 页面描述:\n\n页面名称:用户信息配置页面\n\n页面用途:用于展示和管理系统中的用户详细信息,包括查看、编辑和筛选用户数据。\n\n页面入口:主菜单 > 用户管理 > 用户信息配置\n\n初始页面:默认展示所有用户数据,按注册日期降序排列。\n\n分页机制:每页显示20条记录,支持跳转到指定页码。\n\n样式布局:金额字段(如账户余额)需有千位符号,并保留两位小数右对齐。其他字段根据内容自适应宽度。\n\n### 1.1.2 列表字段:\n\n| 字段表头名称 | 字段解释 | 字段样例 |\n|------------------------|------------------------|-------------------------|\n| 用户ID | 系统生成的唯一标识 | 123456 |\n| 用户名 | 用户的登录名 | johndoe |\n| 密码哈希 | 加密后的密码 | $2a$10$... |\n| 电子邮件 | 用户的邮箱地址 | johndoe@example.com |\n| 联系电话 | 用户的联系电话 | +8613800138000 |\n| 注册日期 | 用户注册时间 | 2023-01-01 |\n| 最近登录时间 | 上次登录时间 | 2023-10-01 12:00 |\n| 用户状态 | 当前状态(启用/禁用) | 启用 |\n| 用户角色 | 分配的角色权限 | 管理员 |\n| 证件类型 | 身份证、护照等类型 | 身份证 |\n| 证件号码 | 对应证件号码 | 110101199003076512 |\n| 开户银行 | 银行名称 | 中国银行 |\n| 银行卡号 | 银行账号 |- |- |- |- |- |- |- |- |- |- |- |- |- |\n\n### 1.1.3 筛选和排序:\n\n筛选条件:\n\n| 筛选项 \t\t| 默认值 \t\t| 筛选框类型 \t\t| 输入限制 \t\t|-关联约束 \t\t\t|\n|-|-|-|-|-|\n注册日期 \t\t\t空 \t\t日期选择器 \t\t不超过当前日期 \t 无 \n用户状态 \t\t 全部 \t 下拉选择框 \t 启用/禁用 无 \n用户角色 \t\t 全部 \t 多选下拉框 - - \n性别 \t\t 全部 \t 单选按钮 男/女/其他 无 \n手机号码认证状态 \t 全部 \t 下拉选择框 \t 已认证/未认证 无 \n\n默认排序规则:按注册日期降序排列。\n\n### 1.1.4 交互和操作:\n\n描述用户可以对列表数据进行哪些操作,如选择、删除、编辑等。\n\n#### 编辑用户信息:\n\n允许管理员编辑用户基本信息,如用户名、电子邮件等。\n\n表单字段说明:\n\n- 用户名: 文本输入框, 默认值: johndoe, 输入限制: 50字符以内, 是否必填: 是\n- 电子邮件: 文本输入框, 默认值: johndoe@example.com, 输入限制: 邮箱格式验证, 是否必填: 是\n- 联系电话: 文本输入框, 默认值: +8613800138000, 输入限制: 电话号码格式验证, 是否必填: 否\n\n验证规则:\n- 邮箱格式必须符合标准邮箱格式。\n- 联系电话需符合国际电话号码格式。\n\n操作异常:\n错误消息以红色字体显示在对应字段下方,例如“请输入有效的邮箱地址”。\n\n#### 删除用户:\n\n允许管理员删除不再需要的用户记录。删除前需确认操作,以防误删。\n\n### 1.1.5 性能要求:\n\n在处理大量数据时,列表加载时间不超过3秒,支持快速分页切换与筛选结果实时更新。", "usage": { "prompt_tokens": 1402, "prompt_unit_price": "2.50", "prompt_price_unit": "0.000001", "prompt_price": "0.0035050", "completion_tokens": 863, "completion_unit_price": "10.00", "completion_price_unit": "0.000001", "completion_price": "0.0086300", "total_tokens": 2265, "total_price": "0.0121350", "currency": "USD", "latency": 13.259290382266045 }, "finish_reason": "stop" }

image THX

Could you give the only markdown text without \n like chars and then put it around with ```. Eq:

| GPT-4o | Llama-70b | DeepSeek |
|-----|-------|---------|
|{{#llm.text#}}|{{#17261508557390.text#}}|{{#17261524414940.text#}}|

So that I can know the issue is caused by markdown parser or the wrong markdown format. @GeorgeGalway

iamjoel commented 1 month ago

I have use the GPT4O model and generate tables with more rows. Still can't reproduce.

image
GeorgeGalway commented 1 month ago

I have use the GPT4O model and generate tables with more rows. Still can't reproduce. image

I will send you a DSL file. Could you try it out and choose '列表需求'? The prompt is: '请设计一套用户登录的表单,表单至少10个字段' image DSL.zip

iamjoel commented 1 month ago

Ok, I'll try it later.

iamjoel commented 1 month ago

I can reproduce it by your DSL. I find the problem is caused by LLM output the wrong markdown table format.

image
GeorgeGalway commented 1 month ago

I can reproduce it by your DSL. I find the problem is caused by LLM output the wrong markdown table format. image

With the same model and the same prompt, FastGPT just works better. Anyway, thank you, I'll look for the reason myself.