isomerpages / isomercms-backend

A static website builder and host for the Singapore Government
5 stars 1 forks source link

[LS-81] fix(markdown-utils): change sanitization process + add unescape #718

Closed seaerchin closed 1 year ago

seaerchin commented 1 year ago

Problem

Previously, there was inconsistent behaviour caused by sanitization on the backend. This is because of dompurify's sanitization config, where it will automatically html encode certain special characters if it detects that there is a html tag.

This process affects not just content within the tag, but the string as a whole. For example,

const x = "& <b>&something</b>"

will have both ampersands encoded even though the first one is outside of the b-tag.

Closes LS-81

Solution

In order to make sure that sanitization takes place properly, this PR establishes an invariant, as follows: frontmatter content is never html encoded.

This is chosen over html encoding all content in our frontmatter due to the existence of the 3 special pages (homepage/nav/contact-us), where users are able to input html (:sadge:)

In order to preserve this property, a few rules have to be followed (done alr at present)

  1. devs never call sanitize directly for markdown files but through a given interface (<convert|retrieve>DataFromMarkdown).
  2. the respective conversion/retrieval functions separately sanitize frontmatter/body
  3. we run unescape after sanitizing frontmatter.

This allows us

Testing

Notes This change might be destructive - existing pages w html encoded properties in their frontmatter will have them unescaped. I'm not entirely sure how this impacts things like permalink etc but it's also possible to guarantee only for the title property

seaerchin commented 1 year ago

@alexanderleegs tagging you separately in case there is anything that isn't back-compat

alexanderleegs commented 1 year ago

@alexanderleegs tagging you separately in case there is anything that isn't back-compat

I think this is fine, we previously didn't do any html encoding for the special pages either as far as i know?

seaerchin commented 1 year ago
Screenshot 2023-04-20 at 11 29 45 AM

creating new pages creates this commented out thing at the moment!

does this actually impact editing experience? this is injected due to sanitize encountering an empty body and injecting it into a document. if it's concerning, we could always avoid sanitization if it's an empty string

alexanderleegs commented 1 year ago
Screenshot 2023-04-20 at 11 29 45 AM

creating new pages creates this commented out thing at the moment!

does this actually impact editing experience? this is injected due to sanitize encountering an empty body and injecting it into a document. if it's concerning, we could always avoid sanitization if it's an empty string

Could we put in the check for empty string then? As a user creating a new page, having something you didn't input immediately show up in your editing view probably isn't ideal