huggingface / chat-ui

Open source codebase powering the HuggingChat app
https://huggingface.co/chat
Apache License 2.0
7.62k stars 1.12k forks source link

Unsanitized "$lt;" still occurs #533

Open mckbrchill opened 1 year ago

mckbrchill commented 1 year ago

I noticed that "$lt;" still occurs in the code blocks

image

It seems it happens when the code block is in list token parsed by marked library. So it falls into else statement here, because token.type == list, and it contains code inside:

{#if token.type === "code"}
    <CodeBlock lang={token.lang} code={unsanitizeMd(token.text)} />
{:else}
    <!-- eslint-disable-next-line svelte/no-at-html-tags -->
    {@html marked(token.raw, options)}
{/if}
mckbrchill commented 1 year ago

Yep, it's because the code block is nested inside list item, moreover in my case this list item also has subtokens (item.tokens) and code token is one of them, so unsanitizeMd doesn't touch it. Also because of that the code is rendered as a default markdown without language highlighting.

upd.: seems that some universal in-depth traversing through all token.items for list and token.tokens for list items and other tokens if there are any can fix the problem, but I don't know how to properly implement it

nsarrazin commented 1 year ago

Thanks for reporting this, I guess indeed we need some kind of recursive traversing of tokens, not sure how that would work 🤔