privateOmega / html-to-docx

HTML to DOCX converter
MIT License
361 stars 137 forks source link

Fails with Rendered MarkDown (undefined reading `encode`) #130

Open AlbinoGeek opened 2 years ago

AlbinoGeek commented 2 years ago

The HTML String I am trying to convert is:

<!DOCTYPE html>
<html>
<body>
  <div class="MuiBox-root css-0">
    <h1 class="MuiTypography-root MuiTypography-h1 MuiTypography-gutterBottom css-a4d0r7-MuiTypography-root">Page</h1>
  </div>
  <div class="MuiBox-root css-0">
    <p>This is technically a page builder.</p>
  </div>
  <div class="MuiBox-root css-0">
    <p class="MuiTypography-root MuiTypography-body1 MuiTypography-gutterBottom css-d3wcwz-MuiTypography-root">If you&#x27;re super lazy, you can build a page out of one markdown component.</p>
  </div>
</body>
</html>

Results

decode.js?afe7:16 Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'encode')
    at eval (decode.js?afe7:16:1)
    at String.replace (<anonymous>)
    at decode (decode.js?afe7:11:1)
    at Object.convert (htmlparser-to-vdom.js?c5c7:10:1)
    at eval (htmlparser-to-vdom.js?c5c7:25:1)
    at Array.map (<anonymous>)
    at Object.convertTag (htmlparser-to-vdom.js?c5c7:24:1)
    at Object.convert (htmlparser-to-vdom.js?c5c7:8:1)
    at eval (htmlparser-to-vdom.js?c5c7:25:1)
    at Array.map (<anonymous>)
    at Object.convertTag (htmlparser-to-vdom.js?c5c7:24:1)
    at Object.convert (htmlparser-to-vdom.js?c5c7:8:1)
    at eval (htmlparser-to-vdom.js?c5c7:25:1)
    at Array.map (<anonymous>)
    at Object.convertTag (htmlparser-to-vdom.js?c5c7:24:1)
    at Object.convert (htmlparser-to-vdom.js?c5c7:8:1)
    at eval (htmlparser-to-vdom.js?c5c7:25:1)
    at Array.map (<anonymous>)
    at Object.convertTag (htmlparser-to-vdom.js?c5c7:24:1)
    at Object.convert (htmlparser-to-vdom.js?c5c7:8:1)
    at eval (html-to-vdom.js?e058:19:1)
    at Array.map (<anonymous>)
    at convertHTML (html-to-vdom.js?e058:18:1)
    at addFilesToContainer (html-to-docx.esm.js?cc75:1:244835)
    at generateContainer (html-to-docx.esm.js?cc75:1:248089)
    at _callee$ (SaveAsDocX.tsx?e200:25:38)

Yet if I pass simpler HTML it works (proving that at least the package is working):

<!DOCTYPE html>
<html>
<body>
  <h1>Hello, World!</h1>
</body>
</html>
marcusparsons commented 2 years ago

Can you show the portion of your NodeJS code that is failing? I tried your markup, and it exports to a DOCX file perfectly fine.

AlbinoGeek commented 2 years ago

Can you show the portion of your NodeJS code that is failing? I tried your markup, and it exports to a DOCX file perfectly fine.

That's the thing, I'm not doing this in NodeJS, I'm doing it in the browser.

import { Button } from '@mui/material'
import { saveAs } from 'file-saver'
import HTMLtoDOCX from 'html-to-docx'
import { useCallback } from 'react'
import ReactDOMServer from 'react-dom/server'
import { BuilderElement, State } from './Element'
import Renderer from './Renderer'

type Props = {
  getElement: (id: string) => [number, BuilderElement]
  state: State
}

export default function SaveAsDocX(props: Props): JSX.Element | null {
  const saveAsDocx = useCallback(async () => {
    const htmlString = ReactDOMServer.renderToStaticMarkup(
      <Renderer
        getElement={props.getElement}
        state={props.state}
        static={true}
      />
    )

    const fileData = await HTMLtoDOCX(
      htmlString, // also tried wrapping in an HTML Document, w/ doctype, etc.
      null,
      {
        orientation: 'portrait',
        margins: {
          bottom: 1440,
          top: 1440,
          left: 1800,
          right: 1800,
          header: 720,
          footer: 720,
          gutter: 0,
        },
        title: 'Example - Created by ...',
        subject: 'Created by ...',
        creator: '...',
        keywords: ['...'],
        description: '',
        lastModifiedBy: '...',
        revision: 1,
        font: 'Calibri',
        footerHTMLString: 'Created by ...',
      })

    saveAs(fileData, 'Example.docx')
  }, [props.getElement, props.state])

  return <Button fullWidth onClick={saveAsDocx} variant="contained">
    DocX
  </Button>
}

This works for simple markup, but once the markup contains some MUI components, it implodes with the error as described above.

Here's an excerpt of Renderer for your consideration, it just builds a DOM tree out of different elements (think: Page Builder). It works, and HTML exports of the DOM tree also work, just not DocX exports.

import { Box } from '@mui/system'
import dynamic from 'next/dynamic'
import { useCallback } from 'react'
import {
  BuilderElement,
  LayoutElement,
  State
} from './Element'
import Button from './module/Button'
import Hero from './module/Hero'
import Text from './module/Text'

const Markdown = dynamic(
  () => import('./module/Markdown'),
  { ssr: false }
)

type Props = {
  getElement: (id: string) => [number, BuilderElement]
  state: State
  static?: true
}

export default function Renderer(props: Props): JSX.Element {
  const {
    getElement,
    state,
    static
  } = props

  const renderInner = useCallback(
    (ele: BuilderElement, layout: LayoutElement): JSX.Element | null => {
      switch (ele.name) {
      case 'Button':
        return <Button {...ele.props} />

      case 'Hero':
        return <Hero {...ele.props} />

      case 'Text':
        if (ele.props.renderAs === 'md') {
          return <Markdown label={ele.props.label} />
        } else return <Text label={ele.props.label} variant={ele.props.variant} />

      default:
        return null
      }
    }
    , [])

  if (static) return <>
    {state.page.layout.elements?.map(
      (layout: LayoutElement) =>
        <Box key={layout.id}>
          {renderInner(getElement(layout.id)[1], layout)}
        </Box>
    )}
  </>

Example of a module/component, with Button shown below:

import { Button as MuiButton } from '@mui/material'

export type ButtonProps = {
  label: string
  variant: 'contained' | 'outlined' | 'text' | undefined
}

export default function Button(props: ButtonProps): JSX.Element {
  const {
    label = 'Button',
    variant = 'contained',
    ...rest
  } = props

  return <MuiButton variant={variant} {...rest}>
    {label}
  </MuiButton>
}

And hell, just for completeness, here are my types:

export type LayoutElement = {
  id: string
  elements?: LayoutElement[]
}

export type BuilderElement = {
  id: string

  name: string

  // eslint-disable-next-line @typescript-eslint/no-explicit-any
  props: any
}
tasola commented 1 year ago

@AlbinoGeek Hi! I'm facing the same issue when trying to download html which has been generated by ReactMarkdown. Did you find any way to make this work?

AlbinoGeek commented 1 year ago

@AlbinoGeek Hi! I'm facing the same issue when trying to download html which has been generated by ReactMarkdown. Did you find any way to make this work?

Sorry to say, the solution for us was to drop support for exporting as office documents in our platform.

tasola commented 1 year ago

@AlbinoGeek Haha noo, the one thing I didn't wanna hear. Thanks for the quick response!

tasola commented 1 year ago

In case someone stumbles upon this issue later on, the problem for me was the unicode characters. I simply decoded all unicode characters in my html string (can be done by using html-entities) before passing it into HTMLtoDOCX like so:

import { decode } from 'html-entities'
import HTMLtoDOCX from 'html-to-docx'

...

const blob = await HTMLtoDOCX(decode(sourceHTML), header, {}, footer)
nicolasiscoding commented 1 year ago

Can you make a PR for this and bake it in as an option?

Cool workaround

On Wed, Jan 11, 2023 at 7:59 PM Petter Tasola @.***> wrote:

In case someone stumbles upon this issue later on, the problem for me was the unicode characters. I simply decoded all unicode characters in my html string (can be done by using html-entities https://www.npmjs.com/package/html-entities) before passing it into HTMLtoDOCX like so:

import { decode } from 'html-entities' import HTMLtoDOCX from 'html-to-docx'

...

const blob = await HTMLtoDOCX(decode(sourceHTML), header, {}, footer)

— Reply to this email directly, view it on GitHub https://github.com/privateOmega/html-to-docx/issues/130#issuecomment-1378837509, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHLNTXI4XCWDYISV7TA4UDWR27VDANCNFSM5TUC74LQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

tasola commented 1 year ago

@nicolasiscoding Thanks! I created https://github.com/privateOmega/html-to-docx/pull/181. Please check it out once you get time!