Open adhishthite opened 4 months ago
If you still have access, could you send me the raw conversation that shows this behaviour ? there's a download button next to user messages in the UI
OK. Think I can explain this one, and offer an improvement.
Code blocks in markdown can either be fenced (```html
) or indented 4 spaces (
).
The issue arises when the LLM responds with a code block that is both fenced AND indented.
In this case I think the correct behaviour is to show a code block, with the fences displayed as part of the code. VSCode and https://markdownlivepreview.com/ do this.
What is happening in Chat-UI seems to be:
<CodeBlock>
isn't used.<pre>
and <code>
tags, causing the styling to look similar to a correctly rendered code block and the < to go through as-is. Note that the Copy to Clipboard button is not present because it hasn't been rendered by CodeBlock.In looking at this, I've bumped the marked library to 13.0.3 and then 14.0 (to see if this fix : https://github.com/markedjs/marked/pull/3264 would make a difference - it doesn't). This does change the interface a little but fairly easy to update.
In the meantime, adding this to ChatMessage returns the <
to <
and renders the fences, although doesn't benefit from the Syntax Highlighting and Copy to Clipboard button that <CodeBlock>
provides.
renderer.code = (code) => {
return `<pre><code>${sanitizeMd(code.raw)}</code></pre>`;
}
@nsarrazin - let me know if you want this as a PR - I think this improves behaviour but isn't a proper fix. In the meantime, I'm going to tweak my System prompt to advise the LLM against using indents for Code Blocks, and ultimately think this is a defect in the marked library?
:(
:) with additional render hook
Leaving the previous comment, however I am not 100% confident on whether indents are the only cause after more testing. The output below uses <CodeBlock>
from list entries 10 onward...
"updates": [
{
"type": "status",
"status": "started"
},
{
"type": "title",
"title": "💻 HTML tags"
},
{
"type": "finalAnswer",
"text": "Certainly! Here are some common HTML tags with a brief description:\n\n1. **`<html>`**: The root element of an HTML page. It encloses all the content of the page.\n\n ```html\n <html>\n </html>\n ```\n\n2. **`<head>`**: Contains meta-information about the HTML document, like title and links to scripts and stylesheets.\n\n ```html\n <head>\n </head>\n ```\n\n3. **`<title>`**: Defines the title of the document, shown in the browser's title bar or tab.\n\n ```html\n <title>Page Title</title>\n ```\n\n4. **`<body>`**: Contains the content of the document, such as text, images, and hyperlinks.\n\n ```html\n <body>\n </body>\n ```\n\n5. **`<h1>` to `<h6>`**: Header tags, `<h1>` is the highest level and `<h6>` is the lowest. They define headings.\n\n ```html\n <h1>Main Heading</h1>\n ```\n\n6. **`<p>`**: Represents a paragraph of text.\n\n ```html\n <p>This is a paragraph.</p>\n ```\n\n7. **`<a>`**: The anchor tag, used to create hyperlinks.\n\n ```html\n <a href=\"https://www.example.com\">Visit Example</a>\n ```\n\n8. **`<img>`**: Embeds an image in the document. It should include the `src` (source) attribute and often an `alt` (alternative text) attribute.\n\n ```html\n <img src=\"image.jpg\" alt=\"Description of image\">\n ```\n\n9. **`<ul>` / `<ol>`**: Unordered (bulleted) and ordered (numbered) list containers, respectively.\n\n ```html\n <ul>\n <li>List item 1</li>\n <li>List item 2</li>\n </ul>\n ```\n\n ```html\n <ol>\n <li>First item</li>\n <li>Second item</li>\n </ol>\n ```\n\n10. **`<li>`**: Represents a list item, used within `<ul>` or `<ol>`.\n\n ```html\n <li>A list item</li>\n ```\n\n11. **`<div>`**: A generic container for content, often used for styling or layout purposes.\n\n ```html\n <div>This is a division.</div>\n ```\n\n12. **`<span>`**: A generic inline container, typically used to apply styles or scripts.\n\n ```html\n <span style=\"color:blue\">This is a blue text.</span>\n ```\n\n13. **`<input>`**: Represents an input field in a form, where data can be entered.\n\n ```html\n <input type=\"text\" name=\"username\">\n ```\n\n14. **`<button>`**: Represents a clickable button.\n\n ```html\n <button>Click me</button>\n ```\n\nRemember, these are just foundational tags, and HTML supports many more elements you can learn about as you build more complex pages.",
"interrupted": false,
"usage": {
"input_tokens": 88,
"output_tokens": 691
}
}
],
Here is a snippet that shows the issue:
The handling of code blocks in lists changes; asking the LLM via Chat-UI to repeat all or part of the block verbatim shows the behaviour.
The GFM spec recommends using a blank HTML comment to disambiguate indented blocks: https://github.github.com/gfm/#example-288
## Inside a List
- This is a test (normal fences)
```html
<foo />
This is another test (indented block)
<bar />
This is a further test (indents and fences)
<foo />
<bar />
Test complete
This is a test (normal fences)
<foo />
This is another test (indented block)
<foo />
<bar />
This is another test (indents and fences)
```
<foo />
<bar />
```
Test complete
Final update on this for the moment - the issue also occurs when code blocks are children of lists, causing the parse(token.raw) to show the child codeblock rather than being caught by the type==="code" clause here:
Can't see an obvious quick way to fix this.
Getting this issue with Qwen2.5-Coder-32B-Instruct:
The raw markdown looks like:
### Explanation of the Code
1. **Loop through each `char*` and delete it:**
```cpp
for (size_t i = 0; i < count; i++) {
delete suggestions[i];
suggestions[i] = 0;
}
Seems like the code block produced by Qwen is indented, which usually isn't common, but seems to be more common with this particular model.
It's because it's a child of a bulleted/numbered list. In this case it doesn't use the CodeBlock component but the marked output.
On Tue, 12 Nov 2024, 09:19 Rotem Dan, @.***> wrote:
Getting this issue with Qwen2.5-Coder-32B-Instruct:
Screenshot_1.png (view on web) https://github.com/user-attachments/assets/bcac2c50-e676-4a2c-9393-1a6aa60dffb1
The raw markdown looks like:
Explanation of the Code
- *Loop through each `char` and delete it:**
for (size_t i = 0; i < count; i++) { delete suggestions[i]; suggestions[i] = 0; }
Seems like the code block produced by Qwen is indented, which usually isn't common, but seems to be more common with this particular model.
— Reply to this email directly, view it on GitHub https://github.com/huggingface/chat-ui/issues/1337#issuecomment-2469995884, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOYXFQ6HIS3NX7Q73VCHWD2AHBZXAVCNFSM6AAAAABKVHNUMCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRZHE4TKOBYGQ . You are receiving this because you commented.Message ID: @.***>
Last reply not helpful - there are 2 separate issues: 1) Code blocks that are children of lists don't get rendered via the CodeBlock component. 2) Those code blocks render "<" symbols incorrectly.
I can produce a PR for the second issue (I fixed this in my fork but left it as it's not a "complete" fix).
Adding this to ChatMessage fixes the <'s.
renderer.code = (code) => {
return `<pre><code>${sanitizeMd(code.raw)}</code></pre>`;
}
@nsarrazin Whenever I ask chat-ui to explain / generate code, the
<
does not get rendered correctly. Can you please take a look?