niochat / nio

💬 Nio is an upcoming matrix client for iOS.
https://niochat.github.io/
Mozilla Public License 2.0
542 stars 44 forks source link

Message parsing: Replace Markdown with HTML #281

Open kiliankoe opened 3 years ago

kiliankoe commented 3 years ago

As recently learned in the chat, Matrix messages cannot be interpreted directly as markdown. Such formatting is purely coincidental, just as users directly using BBCode would be.

See this message content for example.

"content": {
  "msgtype": "m.text",
  "body": "*test*",
  "format": "org.matrix.custom.html",
  "formatted_body": "<em>test</em>"
}

The body contains markdown formatting, but this cannot be used directly. Instead we have to check the format, which will likely be non-existent for plaintext or org.matrix.custom.html like here. In that case we can interpret the formatted_body as HTML and render that.

It's totally up to clients to specify how users can markup their messages. For outgoing messages it would likely make sense for Nio to just assume markdown (and show a live formatting preview in the message composer), turn that into HTML and format the message content as above.

helje5 commented 3 years ago

You might want to split this issue into two, as it addresses two distinct topics: a) message parsing (change from Markdown to HTML) b) message composition (still parse Markdown, but emit HTML on send)

helje5 commented 3 years ago

If we keep this one for a), there are multiple options. The attributed string parser Nio is currently using can build quite complex stuff, e.g. paragraph formats for quotes. We could parse the HTML and build a similar one.

Another option is to parse the HTML into an own AST which we directly render as SwiftUI. E.g. this can be useful for block level elements (having them as separate View's, e.g. a source highlighting View for code blocks). Like:

struct Message {
  enum Block {
    case paragraphs([Runs])
    case quote([Runs])
    case code(String, language: String?)
  }
  let blocks : [ Block ]
}

But both options have their pro's and cons. E.g. a disadvantage of SwiftUI Text is that it isn't selectable.

For the HTML it would be interesting to know whether the "custom.html" is well formed, i.e. whether we could use NSXMLParser, or whether we'd have to use libxml2 directly. Originally I though we could use the HTML parser, but that only seems to be exposed as NSXMLDocument, which is not available on iOS.

kiliankoe commented 3 years ago

Splitting this up definitely sounds sensible 👍 I'll open a new issue for message composition.

For the HTML it would be interesting to know whether the "custom.html" is well formed

I would very much hope it to be, but can we be sure? It might very well be for Element, but other clients could be sending malformed HTML (the format will be the same), so I don't think we'll get around covering that.

helje5 commented 3 years ago

It is documented in here: https://matrix.org/docs/spec/client_server/r0.6.1#id335

So that seems to allow open tags, at least it doesn't mention otherwise.

The strongly suggested set of HTML tags to permit, denying the use and rendering of anything else, is: font, del, h1, h2, h3, h4, h5, h6, blockquote, p, a, ul, ol, sup, sub, li, b, i, u, strong, em, strike, code, hr, br, div, table, thead, tbody, tr, th, td, caption, pre, span, img.

helje5 commented 3 years ago

BTW: This also poses special challenges when editing messages. One might want to warn the user when the client can't deal w/ special content (e.g. if it contains a table).

kiliankoe commented 3 years ago

Oh god, tables are possible? If I try and edit a message with a table in Element it just breaks down and lists all cells as a list, nice 😅

helje5 commented 3 years ago

Major feature of Mattermost over Slack ;-)