Lexical rich text misformatted when converted to HTML

payloadcms / payload

Payload is the open-source, fullstack Next.js framework, giving you instant backend superpowers. Get a full TypeScript backend and admin panel instantly. Use Payload as a headless CMS or for building powerful applications.

https://payloadcms.com

MIT License

23.1k stars 1.44k forks source link

Lexical rich text misformatted when converted to HTML #5146

Open wkentdag opened 6 months ago

wkentdag commented 6 months ago

Link to reproduction

https://github.com/wkentdag/payload/blob/main/test/_community/int.spec.ts#L80-L100

Describe the Bug

I followed the recommended approach for converting lexical rich text into HTML, and discovered that some of the formatting gets stripped in the process. Inline marks, headings, and lists transfer fine; but indentation, alignment, and preformatted whitespace are ignored.

I realize I may need to set up my own Feature to handle preformatted whitespace, but I expect that the align and indent features will work out-of-the-box.

To demonstrate, things look correct in the editor:

But when I convert the Lexical markup to HTML, it looks like this:

In my test repro I am using the "ad-hoc" method of converting Lexical to HTML, but you get the same result with the HTMLConverterFeature. Happy to add another test case with that example, I just skipped it to keep things simple.

To Reproduce

Clone my fork
run jest test/_community/int.spec.ts -t '_Community Tests correctly formats rich text'
Note the failing test, which specifically chokes on the center-aligned and indented paragraphs.

Payload Version

2.11.1

Adapters and Plugins

db-mongodb

GeekPro101 commented 5 months ago

+1, setting align does nothing in the HTML for me when using HTMLConverterFeature

SimYunSup commented 5 months ago

There is no Converter for align and indent. But "whitespace stripped" is not bug. Just add CSS white-space.

wkentdag commented 4 months ago

@SimYunSup fair, but I expect the generated HTML to match the markup rendered (correctly) in the admin Lexical editor out of the box. so IMO this is still a bug.

One possible solution would be to add <pre> to the default feature set. Seems like a simple enough feature, I'd be happy to make a PR for that. what do you think @AlessioGr ?

AlessioGr commented 4 months ago

I think adding align and indent to the default converters would be a good idea. PRs for that are welcome!

I do think that we shouldn't output hard-coded style attributes in the generated HTML, though. Feels too opinionated. But we can add classNames that make it easy for the user to target those with their own styles

wkentdag commented 4 months ago

@AlessioGr I wasn't suggesting modifying the generated HTML, but rather adding a preformatted whitespace feature that you can select from the dropdown. So where I'm currently expecting this snippet of rendered HTML:

<p>weirdly formatted</p>

I'd swap the paragraph for pre and get this instead:

<pre>weirdly formatted</pre>.

That's an enhancement that I'm planning to tackle. However, my primary concern is that the indent and align features are stripped during HTML conversion. Is that not a bug?

SimYunSup commented 4 months ago

The Lexical Adapter is creating a Feature by adding Nodes and Plugins. It's important to note that paragraph (as <p> tag) has a Node and a corresponding converter, while Align and Indent only add attributes to the Node(code), doesn't have a Node.

If you want to convert a <p> tag to a <pre> tag, you can change the paragraph converter from HTMLConverterFeature in your service code.

However, Align and Indent are different. They are not Node-agonistic features, so you need to add a new converter (type HTMLConverter)

wkentdag commented 4 months ago

Thanks for the explanation @SimYunSup, I'm definitely still trying to wrap my head around all the Lexical concepts.

I'm able to reproduce align and indent in HTML by tweaking the default paragraph converter like this:

export const FormattedParagraphHTMLConverter: HTMLConverter<any> = {
  // @ts-expect-error figure out what submissionData is/does
  async converter({ converters, node, parent, submissionData }) {
    const childrenText = await convertLexicalNodesToHTML({
      converters,
      lexicalNodes: node.children,
      parent: {
        ...node,
        parent,
      },
      // @ts-expect-error saa
      submissionData,
    })

    let pTag = '<p>'
    let style = ''
    if (!!node.format) {
      style += `text-align: ${node.format};`
    }

    if (node.indent > 0) {
      style += `text-indent:${node.indent}em;`
    }

    if (!!style) {
      pTag = `<p style="${style}">`
    }

    return `${pTag}${childrenText}</p>`
  },
  nodeTypes: ['paragraph'],
}

It would be nice to have something like this in the default config without having to setup a custom feature. I can see how hardcoding the style attributes is a little rigid...what kind of interface are you imagining for providing classNames @AlessioGr ? Or is it just as simple as swapping class="align-center" for style="text-align:center;" ?

ozzestrophic commented 1 month ago

I agree that aligning and indentations should be resolved by default in the serializer. Will this solution be merged anytime?

mobeigi commented 5 days ago

I also would like for this PR (https://github.com/payloadcms/payload/pull/5814) to be merged / reviewed. Feels like extremely basic (and desirable) functionality for those who want to render lexical to HTML. Otherwise you have no other way of positioning text based on the editors AlignFeature. Looks like it pops up on the Discord community-help quite frequently too.

Is there a way to implement the PR locally as a workaround? The convertLexicalNodesToHTML from payload/packages/richtext-lexical/src/field/features/converters/html/converter/index.ts isn't exported which makes this annoying.

SimYunSup commented 5 days ago

@mobeigi convertLexicalNodesToHTML is exported in @payloadcms/richtext-lexical.

https://github.com/payloadcms/payload/blob/25e9bc62dbcbabcb3619cf83e3dc0110e0a4cabf/packages/richtext-lexical/src/index.ts#L276-L279

But you should use it with the following caution.

It must be head of converter because find converter with Array.find.