srid / zulip-archive

Zulip Archive viewer (statically generated HTML)
https://funprog.srid.ca
13 stars 1 forks source link

OGP - "description" should be plain text #16

Closed srid closed 4 years ago

srid commented 4 years ago

The description field is currently set to a HTML value, as such it doesn't seem to appear in link preview.

Zulip API can return the original Markdown format. Setting apply_markdown to true in /api/get-messages will get the original format. But we need the HTML as well, so maybe this API should be invoked twice.

Once both the formats are stored in the Message object (and consequently messages.json) we can use the markdown format in OGP description.

srid commented 4 years ago

Surprisingly this is not an issue in Notion:

image

/cc @prikhi

prikhi commented 4 years ago

The other option is just stripping the HTML tags before rendering the description field. Hakyll uses a naive drop/take function: https://hackage.haskell.org/package/hakyll-4.1.2.1/docs/src/Hakyll-Web-Html.html#stripTags

For a more resilient implementation, we could use the tagsoup package:

import Data.Maybe (mapMaybe)
import qualified Data.Text as T
import Text.HTML.TagSoup (maybeTagText, parseTags)

stripHtml = T.intercalate " " . filter (not . T.null) . mapMaybe (fmap T.strip . maybeTagText) . parseTags
prikhi commented 4 years ago

Also, maybe we don't need the T.intercalate/T.strip calls and can just use T.concat

prikhi commented 4 years ago

I can PR that stripHtml if that seems OK, would have to do more digging to figure out the markdown solution.

srid commented 4 years ago

Sure, go ahead! Workarounds are fine.