HTML to Lexical Content <img> tag conversion issue

matteo-naif commented 2 months ago

Link to reproduction

-

Payload Version

3.0.0-beta.90

Node Version

v20

Next.js Version

15.0.0-canary.121

Describe the Bug

When I convert HTML to Lexical the images are not converted. There is a way to handle it?

Reproduction Steps

Code I'm using to convert HTML -> Lexical

import { createHeadlessEditor } from '@lexical/headless';
import { $generateNodesFromDOM } from '@lexical/html';
import { getEnabledNodes, sanitizeServerEditorConfig } from '@payloadcms/richtext-lexical';
import { JSDOM } from 'jsdom';
import { $getRoot, $getSelection } from 'lexical';
import { defaultEditorConfig } from '@payloadcms/richtext-lexical'
import configPromise from '@payload-config'

const convertHTMLToLexicalNodes = async (htmlString: string) => {

  const yourEditorConfig = defaultEditorConfig

  const headlessEditor = createHeadlessEditor({
    nodes: getEnabledNodes({
      editorConfig: await sanitizeServerEditorConfig(yourEditorConfig, await configPromise),
    }),
  })

  headlessEditor.update(() => {

    // In a headless environment you can use a package such as JSDom to parse the HTML string.
    const dom = new JSDOM(htmlString)

    // Once you have the DOM instance it's easy to generate LexicalNodes.
    const nodes = $generateNodesFromDOM(headlessEditor, dom.window.document)

    // Select the root
    $getRoot().select()

    // Insert them at a selection.
    const selection = $getSelection()
    selection?.insertNodes(nodes)

  }, { discrete: true })

  // Do this if you then want to get the editor JSON
  const editorJSON = headlessEditor.getEditorState().toJSON()

  // Clear Editor state
  headlessEditor.update(() => {
    const root = $getRoot();
    root.clear();
  }, { discrete: true });

  return editorJSON;
};

Original HTML

<p>Lorem Ipsum</p><img src="https://upload.wikimedia.org/wikipedia/commons/7/76/RMS_Republic.jpg">

Converted Lexical (JSON) without image

{
   "root":{
      "children":[
         {
            "children":[
               {
                  "detail":0,
                  "format":0,
                  "mode":"normal",
                  "style":"",
                  "text":"Lorem Ipsum",
                  "type":"text",
                  "version":1
               }
            ],
            "direction":null,
            "format":"",
            "indent":0,
            "type":"paragraph",
            "version":1,
            "textFormat":0,
            "textStyle":""
         }
      ],
      "direction":null,
      "format":"",
      "indent":0,
      "type":"root",
      "version":1
   }
}

Adapters and Plugins

"@payloadcms/db-mongodb": "beta",     "@payloadcms/email-nodemailer": "beta",     "@payloadcms/next": "beta",     "@payloadcms/plugin-cloud-storage": "beta",     "@payloadcms/plugin-form-builder": "beta",     "@payloadcms/plugin-nested-docs": "beta",     "@payloadcms/plugin-seo": "beta",     "@payloadcms/richtext-lexical": "beta",     "@payloadcms/storage-s3": "beta",     "@payloadcms/ui": "beta",

AlessioGr commented 2 months ago

This is currently expected, as the converter cannot auto-upload those images for you. Probably won't add this functionality - don't want this function to perform any modifications to your payload db.

Though there are other strategies to convert html images => lexical (work with JSON, or deploy your own script first that performs those auto-uploads). I've marked this issue as documentation and will add something to our docs

matteo-naif commented 2 months ago

This is currently expected, as the converter cannot auto-upload those images for you. Probably won't add this functionality - don't want this function to perform any modifications to your payload db.

Though there are other strategies to convert html images => lexical (work with JSON, or deploy your own script first that performs those auto-uploads). I've marked this issue as documentation and will add something to our docs

Thanks Alessio for the reply. I can intercept the url of the images and load them with a script in parallel but I don't understand how to insert the image node with the correct id in the correct location of the Lexical JSON. Is there a way to sneak into the conversion script?

madsbertelsen commented 2 months ago

Hi @matteo-naif, I'm having trouble getting the headless editor to parse A-nodes. On line 7 you import from @paylod-config. Could you share that file with me? 🙏

matteo-naif commented 2 months ago

Hi @madsbertelsen , the import is an alias that refers to my payload.config.ts

import { layoutBlocks } from '@/app/(payload)/_config/fields/layout/layoutBlocks'
import { mongooseAdapter } from '@payloadcms/db-mongodb'
import { BlocksFeature, lexicalEditor } from '@payloadcms/richtext-lexical'
import { locales } from 'locale.config'
import path from 'path'
import { buildConfig } from 'payload'
import { en } from 'payload/i18n/en'
import { it } from 'payload/i18n/it'
import sharp from 'sharp'
import { fileURLToPath } from 'url'

export default buildConfig({
  editor: lexicalEditor({
    features: ({ defaultFeatures }) => [
      ...defaultFeatures,
      BlocksFeature({
        blocks: layoutBlocks,
      }),
    ]
  }),

  collections: [
    // Collections
  ],

  globals: [
    // Globals  
  ],

  secret: process.env.PAYLOAD_SECRET || '',

  typescript: {
    outputFile: path.resolve(dirname, 'payload-types.ts'),
  },

  db: mongooseAdapter({
    url: process.env.MONGODB_URI || '',
  }),

  i18n: {
    supportedLanguages: { it, en },
  },

  localization: {
    locales: locales.map(l => ({ label: l.label, code: l.code })),
    defaultLocale: locales[0].code,
    fallback: false,
  },

  debug: process.env.NODE_ENV === 'development',

  sharp,

  plugins: [
    // Plugins
  ]

})

madsbertelsen commented 2 months ago

Thanks @matteo-naif! Passing the payload config as argument made it work for me 🥳

matteo-naif commented 1 month ago

@AlessioGr any updates?

payloadcms / payload