plone / blocks-conversion-tool

A tool to convert HTML (as used in Plone Classic) to Blocks (as used on Volto)
7 stars 5 forks source link

Group top-level text and inline elements into paragraphs #18

Closed davisagli closed 1 year ago

davisagli commented 1 year ago

This fixes #16 and #17. Previously unenclosed text nodes were dropped entirely.

Also now includes fixes for two other issues that were found while testing the migration of plone.org:

  1. Fixed invalid output of nested lists for a table cell containing a div
  2. Fixed invalid output of null for whitespace between tags
pbauer commented 1 year ago

That's a improvement for most lot of pages but some are still broken. If you run the html of these it will result in broken blocks. https://plone.org/news-and-events/events/regional/european-symposium-2010/agenda https://plone.org/news-and-events/events/plone-conferences/new-orleans-2003/announcement

davisagli commented 1 year ago

@pbauer Then we need to investigate and determine whether it is the same issue or a new one. It would help if you can open a new issue including the html for those pages and the error that is happening currently. I don't think I have access to the raw html.

pbauer commented 1 year ago

I'll try to do that tomorrow

pbauer commented 1 year ago

This html:

<table>
  <tr>
    <th><span>Header text</span></th>
    <td><div><strong>Cell text</strong></div></td>
  </tr>
</table>

Creates this:

{
 "e7206563-de3e-4043-a370-138fdb32ad2b": {
  "@type": "slateTable",
  "table": {
   "basic": false,
   "celled": true,
   "compact": false,
   "fixed": true,
   "inverted": false,
   "rows": [
    {
     "key": "6kilr",
     "cells": [
      {
       "key": "83s65",
       "type": "header",
       "value": [
        {
         "type": "span",
         "children": [
          {
           "text": "Header text"
          }
         ]
        }
       ]
      },
      {
       "key": "ulod",
       "type": "data",
       "value": [
        [
         {
          "type": "strong",
          "children": [
           {
            "text": "Cell text"
           }
          ]
         }
        ]
       ]
      }
     ]
    }
   ],
   "striped": false
  }
 }
}

Which raises: [Slate] value is invalid! Expected a list of elementsbut got: [[{"children":[{"text":"Cell text"}],"type":"strong"}]]

pbauer commented 1 year ago

And this html (broken and old but hey):

<center>
<a href="http://plone.org/events/conferences/1">
    "Conference Logo":img:http://plone.org/events/conferences/1/plonecon1small.jpg
    </a>\n</center>

    **Plone: In Development and Production**

    Plone - the platform that extends the award-winning Zope application
server is having it\'s first conference.

Produces two blocks, the first is a broken link that triggers a exception:

{'@type': 'slate', 'value': [None, {'type': 'link', 'data': {'url': 'http://plone.org/events/conferences/1', 'title': None, 'target': None}, 'children': [{'text': '     "Conference Logo":img:http://plone.org/events/conferences/1/plonecon1small.jpg     '}]}, None], 'plaintext': '\n\n    "Conference Logo":img:http://plone.org/events/conferences/1/plonecon1small.jpg\n    \n'}
{'@type': 'slate', 'value': [{'type': 'p', 'children': [{'text': "      **Plone: In Development and Production**      Plone - the platform that extends the award-winning Zope application server is having it's first conference."}]}], 'plaintext': "\n\n    **Plone: In Development and Production**\n\n    Plone - the platform that extends the award-winning Zope application\nserver is having it's first conference."}

The exception:

Traceback (innermost last):
  Module ZPublisher.WSGIPublisher, line 167, in transaction_pubevents
  Module ZPublisher.WSGIPublisher, line 376, in publish_module
  Module ZPublisher.WSGIPublisher, line 271, in publish
  Module ZPublisher.mapply, line 85, in mapply
  Module Products.PDBDebugMode.wsgi_runcall, line 60, in pdb_runcall
  Module plone.z3cform.layout, line 63, in __call__
  Module plone.z3cform.layout, line 47, in update
  Module plone.dexterity.browser.edit, line 53, in update
  Module plone.z3cform.fieldsets.extensible, line 65, in update
  Module plone.z3cform.patch, line 30, in GroupForm_update
  Module z3c.form.group, line 145, in update
  Module plone.app.z3cform.csrf, line 21, in execute
  Module z3c.form.action, line 98, in execute
  Module z3c.form.button, line 301, in __call__
  Module z3c.form.button, line 159, in __call__
  Module plone.dexterity.browser.edit, line 28, in handleApply
  Module z3c.form.group, line 124, in applyChanges
  Module zope.event, line 32, in notify
  Module zope.component.event, line 27, in dispatch
  Module zope.component._api, line 134, in subscribers
  Module zope.interface.registry, line 448, in subscribers
  Module zope.interface.adapter, line 899, in subscribers
  Module zope.component.event, line 36, in objectEventNotify
  Module zope.component._api, line 134, in subscribers
  Module zope.interface.registry, line 448, in subscribers
  Module zope.interface.adapter, line 899, in subscribers
  Module plone.app.linkintegrity.handlers, line 116, in modifiedContent
  Module plone.restapi.blocks_linkintegrity, line 39, in retrieveLinks
  Module plone.restapi.blocks_linkintegrity, line 87, in __call__
AttributeError: 'NoneType' object has no attribute 'get'
davisagli commented 1 year ago

@pbauer Thanks, that looks like 2 new cases that need adjustments:

  1. when a table cell contains a div, the result has invalid nested lists
  2. whitespace between tags is ending up as null instead of a valid text node I'm busy this morning; hopefully I can work on it later.
davisagli commented 1 year ago

@pbauer I've added additional fixes for those two issues.