josephrocca / OpenCharacters

Simple little web interface for creating characters and chatting with them. It's basically a single HTML file - no server. Share characters using a link (character data is stored within the URL itself). All chat data is stored in your browser using IndexedDB. Currently supports OpenAI APIs and ~any Hugging Face model.
https://josephrocca.github.io/OpenCharacters
MIT License
364 stars 60 forks source link

Things go wonky when HTML is rendered. Escape < and > #22

Open expecttheunusual opened 1 year ago

expecttheunusual commented 1 year ago

Title.

Ask for it to do anything in HTML, the thing shuffles in madness. Will you fix it so the output and input is properly escaped?

Thanks

josephrocca commented 1 year ago

Hmm, this may be hard to get around depending on specifically what you're referring to - any chance you could record a video, or give an example prompt? I did just fix a problem with code blocks - so they display more consistently/nicely during streaming. I may be able to do something similar with what you're talking about.

kickahaota commented 1 year ago

This seemed interesting, so I had a look.

Normally, when you ask the LLM to write HTML for you, it will enclose the result in ```html markdown, like this:

[USER]: Provide HTML for a simple web page which displays "Sample Page" as its title and "Hello World" as its body text.

[AI]: Sure! Here's a simple HTML document that displays "Sample Page" as its title and "Hello World" as its body text:

```html

<html>
<head>
  <title>Sample Page</title>
</head>
<body>
  <h1>Hello World</h1>
</body>
</html>

```

That causes the HTML to be shown very nicely in the response:

<!DOCTYPE html>
<html>
<head>
  <title>Sample Page</title>
</head>
<body>
  <h1>Hello World</h1>
</body>
</html>

But if the LLM uses HTML in a response without marking it in this way, then the HTML gets rendered as if it were part of your UI, which naturally messes things up. I had to try very, very hard to get the LLM to make this mistake; but it can also happen if someone imports a conversation that uses the less-than and greater-than characters without markdown.

In most web projects that use arbitrary text, the right thing to do is to sanitize the text by replacing < with &lt;, & with &amp;, and so on. But it's trickier here, because you don't want to sanitize text that's already been marked down properly. So if this were my project, I'd be inclined to won't-fix it.