plone / blocks-conversion-tool

A tool to convert HTML (as used in Plone Classic) to Blocks (as used on Volto)
7 stars 5 forks source link

blocks-conversion-tool should be deprecated #27

Open tiberiuichim opened 1 year ago

tiberiuichim commented 1 year ago

Some context:

I never actually needed to do any block conversion until this week.

I've always wished that we would be doing this conversion directly in Plone/python. Long ago I've started some code for this, under the umbrela of https://github.com/eea/eea.volto.slate. I've since moved that code to a plone.volto PR, which needs only final touch ups. If only I could motivate myself to do it...

I'm currently migrating a plone 4 website to Volto. We're keeping Plone 4, we're adding Volto on top of it. So I've started migrating blocks. Testing the blocks conversion on a first page I have the following html to convert:

<div>
<h2>A new EU adaptation strategy</h2>
</div>
<div>
<p>On 24 February 2021, the European Commission adopted the Communication ‘<a href="https://climate-adapt.eea.europa.eu/en/metadata/publications/eu-strategy-on-adaptation-to-climate-change/" rel="noopener" target="_blank" data-linktype="external" data-val="https://climate-adapt.eea.europa.eu/en/metadata/publications/eu-strategy-on-adaptation-to-climate-change/">Forging a climate-resilient Europe – the new EU Strategy on Adaptation to Climate Change</a>’. The Strategy outlines a long-term vision for the EU to become a climate-resilient society, fully adapted to the unavoidable impacts of climate change by 2050. This strategy aims to reinforce the adaptive capacity of the EU and the world and minimise vulnerability to the impacts of climate change, in line with the <a href="https://unfccc.int/process-and-meetings/the-paris-agreement/the-paris-agreement">Paris Agreement</a> and the <a href="https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52020PC0080" rel="noopener" target="_blank" data-linktype="external" data-val="https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52020PC0080">proposal for a European Climate Law</a>.</p>
<p>The Strategy aims to build a climate resilient society by improving knowledge of climate impacts and adaptation solutions; by stepping up adaptation planning and climate risk assessments; by accelerating adaptation action; and by helping to strengthen climate resilience globally. It pursues three objectives and proposes a range of actions in order to meet them:</p>
<ul>
<li><strong>Smarter adaptation</strong>: Improving knowledge and manage uncertainty; including:
<ul>
<li>Pushing the frontiers of adaptation knowledge;</li>
<li>More and better climate loss data; and</li>
<li>Enhancing and expanding Climate-ADAPT as the European platform for adaptation knowledge.</li>
</ul>
</li>
<li>More <strong>systemic adaptation</strong>: Supporting policy development at all levels and all relevant policy fields; including three cross-cutting priorities to integrate adaptation into:
<ul>
<li>Macro-fiscal policy;</li>
<li>Nature-based solutions; and</li>
<li>Local adaptation actions.</li>
</ul>
</li>
<li><strong>Faster adaptation</strong>: Speed up adaptation implementation across the board.</li>
</ul>
<p>Climate-ADAPT, the European platform for adaptation knowledge, will be enhanced and expanded. As a first concrete deliverable of the new Strategy, the <a href="https://climate-adapt.eea.europa.eu/observatory" rel="noopener" target="_blank" data-linktype="external" data-val="https://climate-adapt.eea.europa.eu/observatory">European Climate and Health Observatory</a> will be launched on Climate-ADAPT, to better track, analyse and prevent the impacts of climate change on human health.</p>
<p>The Strategy integrates <strong>international action</strong> for climate resilience into its framework.</p>
<p><br />The new EU Adaptation Strategy links directly to recent global agreements, such as the <a href="https://unfccc.int/process-and-meetings/the-paris-agreement/the-paris-agreement" rel="noopener" target="_blank" data-linktype="external" data-val="https://unfccc.int/process-and-meetings/the-paris-agreement/the-paris-agreement">Paris Agreement</a>, the <a href="https://www.undrr.org/publication/sendai-framework-disaster-risk-reduction-2015-2030" rel="noopener" target="_blank" data-linktype="external" data-val="https://www.undrr.org/publication/sendai-framework-disaster-risk-reduction-2015-2030">Sendai Framework for Disaster Risk Reduction</a> and the <a href="https://www.un.org/sustainabledevelopment/development-agenda/" rel="noopener" target="_blank" data-linktype="external" data-val="https://www.un.org/sustainabledevelopment/development-agenda/">Sustainable Development Agenda</a> as well as the EU implementation of these goals. It also connects directly to major EU initiatives like the <a href="https://ec.europa.eu/info/publications/climate-resilient-europe_en" rel="noopener" target="_blank" data-linktype="external" data-val="https://ec.europa.eu/info/publications/climate-resilient-europe_en">Mission for a Climate resilient Europe</a> and the Union’s <a href="https://ec.europa.eu/info/business-economy-euro/banking-and-finance/sustainable-finance_en" rel="noopener" target="_blank" data-linktype="external" data-val="https://ec.europa.eu/info/business-economy-euro/banking-and-finance/sustainable-finance_en">sustainable finance </a>agenda (foreseen for be renewed in the second quarter of 2021).</p>
<p>The <a href="https://ec.europa.eu/info/strategy/priorities-2019-2024/european-green-deal_en" rel="noopener" target="_blank" data-linktype="external" data-val="https://ec.europa.eu/info/strategy/priorities-2019-2024/european-green-deal_en">European Green Deal</a> (announced in December 2019) presents the Commission's plan for a sustainable green transition. At the heart of the Green Deal, the first European <a href="https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52020PC0080" rel="noopener" target="_blank" data-linktype="external" data-val="https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52020PC0080">Climate Law proposal</a> establishes the framework for achieving climate neutrality by 2050. The proposal recognises adaptation as a key component of the long-term global response to climate change and requires Member States and the Union to enhance their adaptive capacity, strengthen resilience and reduce vulnerability to climate change. It also introduces a requirement for the implementation of national strategies.</p>
<p>Also this new EU Adaptation Strategy was part of the <a href="https://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1596443911913&amp;uri=CELEX:52019DC0640#document2" rel="noopener" target="_blank" data-linktype="external" data-val="https://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1596443911913&amp;uri=CELEX:52019DC0640#document2">Green Deal action plan</a>. The new EU Adaptation Strategy development was based on the evaluation of the <a href="https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52013DC0216" rel="noopener" target="_blank" data-linktype="external" data-val="https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52013DC0216">2013 EU strategy on adaptation to climate change</a> and the gathering of a wide range of stakeholder views to i) inform and design the explored policy options, as well as ii) the appropriate level of ambition. An extensive <a href="https://ec.europa.eu/clima/sites/clima/files/consultations/docs/0037/blueprint_en.pdf" rel="noopener" target="_blank" data-linktype="external" data-val="https://ec.europa.eu/clima/sites/clima/files/consultations/docs/0037/blueprint_en.pdf">Blueprint</a> of the new EU Strategy was provided as part of the open consultation process in 2020 to stimulate public debate and an impact assessment was made.</p>
<p>The 2018 Commission’s <a href="https://climate-adapt.eea.europa.eu/metadata/publications/evaluation-of-the-eu-strategy-on-adaptation-to-climate-change/" rel="noopener" target="_blank" data-linktype="external" data-val="https://climate-adapt.eea.europa.eu/metadata/publications/evaluation-of-the-eu-strategy-on-adaptation-to-climate-change/">evaluation of the (2013) EU Adaptation Strategy</a> finds that this strategy had delivered on its objectives to promote action by Member States, ‘climate-proof’ action at the EU level and support better-informed decision-making. The evaluation included the ‘adaptation preparedness scoreboard’ for measuring Member States’ level of readiness based on qualitative, process-based indicators.</p>
</div>

With the blocks conversion tool, I get (I'm pasting an image, it's inconvenient to upload the text, it's on my remote dev machine):

image

Here's the output of that PR with a homegrown blocks conversion running that Python code (the bs4 version available for python2 is not ok for this task, so I'm doing my own block convertor as a service):

{
   "07035ae3-c9fa-48b0-9bdd-152611b91b95":{
      "plaintext":"",
      "@type":"slate",
      "value":[
         {
            "type":"ul",
            "children":[
               {
                  "type":"li",
                  "children":[
                     {
                        "type":"strong",
                        "children":[
                           {
                              "text":"Smarter adaptation"
                           }
                        ]
                     },
                     {
                        "text":": Improving knowledge and manage uncertainty; including:"
                     },
                     {
                        "type":"ul",
                        "children":[
                           {
                              "type":"li",
                              "children":[
                                 {
                                    "text":"Pushing the frontiers of adaptation knowledge;"
                                 }
                              ]
                           },
                           {
                              "type":"li",
                              "children":[
                                 {
                                    "text":"More and better climate loss data; and"
                                 }
                              ]
                           },
                           {
                              "type":"li",
                              "children":[
                                 {
                                    "text":"Enhancing and expanding Climate-ADAPT as the European platform for adaptation knowledge."
                                 }
                              ]
                           }
                        ]
                     }
                  ]
               },
               {
                  "type":"li",
                  "children":[
                     {
                        "text":"More "
                     },
                     {
                        "type":"strong",
                        "children":[
                           {
                              "text":"systemic adaptation"
                           }
                        ]
                     },
                     {
                        "text":": Supporting policy development at all levels and all relevant policy fields; including three cross-cutting priorities to integrate adaptation into:"
                     },
                     {
                        "type":"ul",
                        "children":[
                           {
                              "type":"li",
                              "children":[
                                 {
                                    "text":"Macro-fiscal policy;"
                                 }
                              ]
                           },
                           {
                              "type":"li",
                              "children":[
                                 {
                                    "text":"Nature-based solutions; and"
                                 }
                              ]
                           },
                           {
                              "type":"li",
                              "children":[
                                 {
                                    "text":"Local adaptation actions."
                                 }
                              ]
                           }
                        ]
                     }
                  ]
               },
               {
                  "type":"li",
                  "children":[
                     {
                        "type":"strong",
                        "children":[
                           {
                              "text":"Faster adaptation"
                           }
                        ]
                     },
                     {
                        "text":": Speed up adaptation implementation across the board."
                     }
                  ]
               }
            ]
         }
      ]
   },
   "e9d0f547-47d2-481e-bdb0-fb06653639f6":{
      "plaintext":"",
      "@type":"slate",
      "value":[
         {
            "type":"p",
            "children":[
               {
                  "text":"Also this new EU Adaptation Strategy was part of the "
               },
               {
                  "data":{
                     "url":"https://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1596443911913&uri=CELEX:52019DC0640#document2"
                  },
                  "type":"link",
                  "children":[
                     {
                        "text":"Green Deal action plan"
                     }
                  ]
               },
               {
                  "text":". The new EU Adaptation Strategy development was based on the evaluation of the "
               },
               {
                  "data":{
                     "url":"https://climate-adapt.eea.europa.eu/metadata/publications/eu-strategy-on-adaptation-to-climate-change"
                  },
                  "type":"link",
                  "children":[
                     {
                        "text":"2013 EU strategy on adaptation to climate change"
                     }
                  ]
               },
               {
                  "text":" and the gathering of a wide range of stakeholder views to i) inform and design the explored policy options, as well as ii) the appropriate level of ambition. An extensive "
               },
               {
                  "data":{
                     "url":"https://ec.europa.eu/clima/sites/clima/files/consultations/docs/0037/blueprint_en.pdf"
                  },
                  "type":"link",
                  "children":[
                     {
                        "text":"Blueprint"
                     }
                  ]
               },
               {
                  "text":" of the new EU Strategy was provided as part of the open consultation process in 2020 to stimulate public debate and an impact assessment was made."
               }
            ]
         }
      ]
   },
   "af2594df-c606-4aa0-a6b8-6d916a293e6f":{
      "plaintext":"",
      "@type":"slate",
      "value":[
         {
            "type":"p",
            "children":[
               {
                  "text":"On 24 February 2021, the European Commission adopted the Communication \u2018"
               },
               {
                  "data":{
                     "url":"https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=COM:2021:82:FIN"
                  },
                  "type":"link",
                  "children":[
                     {
                        "text":"Forging a climate-resilient Europe \u2013 the new EU Strategy on Adaptation to Climate Change"
                     }
                  ]
               },
               {
                  "text":"\u2019. The Strategy outlines a long-term vision for the EU to become a climate-resilient society, fully adapted to the unavoidable impacts of climate change by 2050. This strategy aims to reinforce the adaptive capacity of the EU and the world and minimise vulnerability to the impacts of climate change, in line with the "
               },
               {
                  "data":{
                     "url":"https://unfccc.int/process-and-meetings/the-paris-agreement/the-paris-agreement"
                  },
                  "type":"link",
                  "children":[
                     {
                        "text":"Paris Agreement"
                     }
                  ]
               },
               {
                  "text":" and the "
               },
               {
                  "data":{
                     "url":"https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52020PC0080"
                  },
                  "type":"link",
                  "children":[
                     {
                        "text":"proposal for a European Climate Law"
                     }
                  ]
               },
               {
                  "text":"."
               }
            ]
         }
      ]
   },
   "5eeb6565-26ee-46b5-8e1e-1a1452bf4b94":{
      "plaintext":"",
      "@type":"slate",
      "value":[
         {
            "type":"p",
            "children":[
               {
                  "text":"The 2018 Commission\u2019s "
               },
               {
                  "data":{
                     "url":"https://climate-adapt.eea.europa.eu/metadata/publications/evaluation-of-the-eu-strategy-on-adaptation-to-climate-change/"
                  },
                  "type":"link",
                  "children":[
                     {
                        "text":"evaluation of the (2013) EU Adaptation Strategy"
                     }
                  ]
               },
               {
                  "text":" finds that this strategy had delivered on its objectives to promote action by Member States, \u2018climate-proof\u2019 action at the EU level and support better-informed decision-making. The evaluation included the \u2018adaptation preparedness scoreboard\u2019 for measuring Member States\u2019 level of readiness based on qualitative, process-based indicators."
               }
            ]
         }
      ]
   },
   "e77377b4-4c79-4662-9612-9b23635fdff7":{
      "plaintext":"",
      "@type":"slate",
      "value":[
         {
            "type":"p",
            "children":[
               {
                  "text":"The "
               },
               {
                  "data":{
                     "url":"https://ec.europa.eu/info/strategy/priorities-2019-2024/european-green-deal_en"
                  },
                  "type":"link",
                  "children":[
                     {
                        "text":"European Green Deal"
                     }
                  ]
               },
               {
                  "text":" (announced in December 2019) presents the Commission's plan for a sustainable green transition. At the heart of the Green Deal, the first European "
               },
               {
                  "data":{
                     "url":"https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52020PC0080"
                  },
                  "type":"link",
                  "children":[
                     {
                        "text":"Climate Law proposal"
                     }
                  ]
               },
               {
                  "text":" establishes the framework for achieving climate neutrality by 2050. The proposal recognises adaptation as a key component of the long-term global response to climate change and requires Member States and the Union to enhance their adaptive capacity, strengthen resilience and reduce vulnerability to climate change. It also introduces a requirement for the implementation of national strategies."
               }
            ]
         }
      ]
   },
   "2221a879-8c4d-4c10-810a-eaae2fe84927":{
      "plaintext":"",
      "@type":"slate",
      "value":[
         {
            "type":"h2",
            "children":[
               {
                  "text":"A new EU adaptation strategy"
               }
            ]
         }
      ]
   },
   "f165c9e7-5945-41e8-80e0-365b03a9fe15":{
      "plaintext":"",
      "@type":"slate",
      "value":[
         {
            "type":"p",
            "children":[
               {
                  "text":"The Strategy integrates "
               },
               {
                  "type":"strong",
                  "children":[
                     {
                        "text":"international action"
                     }
                  ]
               },
               {
                  "text":" for climate resilience into its framework."
               }
            ]
         }
      ]
   },
   "cf95f48b-76cc-4187-855e-c28ff2503656":{
      "plaintext":"",
      "@type":"slate",
      "value":[
         {
            "type":"p",
            "children":[
               {
                  "text":"Climate-ADAPT, the European platform for adaptation knowledge, will be enhanced and expanded. As a first concrete deliverable of the new Strategy, the "
               },
               {
                  "data":{
                     "url":"https://climate-adapt.eea.europa.eu/observatory"
                  },
                  "type":"link",
                  "children":[
                     {
                        "text":"European Climate and Health Observatory"
                     }
                  ]
               },
               {
                  "text":" will be launched on Climate-ADAPT, to better track, analyse and prevent the impacts of climate change on human health."
               }
            ]
         }
      ]
   },
   "4eb5e5d7-9995-44a0-ba22-af7d5b69a086":{
      "plaintext":"",
      "@type":"slate",
      "value":[
         {
            "type":"p",
            "children":[
               {
                  "text":"\nThe new EU Adaptation Strategy links directly to recent global agreements, such as the "
               },
               {
                  "data":{
                     "url":"https://unfccc.int/process-and-meetings/the-paris-agreement/the-paris-agreement"
                  },
                  "type":"link",
                  "children":[
                     {
                        "text":"Paris Agreement"
                     }
                  ]
               },
               {
                  "text":", the "
               },
               {
                  "data":{
                     "url":"https://www.undrr.org/publication/sendai-framework-disaster-risk-reduction-2015-2030"
                  },
                  "type":"link",
                  "children":[
                     {
                        "text":"Sendai Framework for Disaster Risk Reduction"
                     }
                  ]
               },
               {
                  "text":" and the "
               },
               {
                  "data":{
                     "url":"https://www.un.org/sustainabledevelopment/development-agenda/"
                  },
                  "type":"link",
                  "children":[
                     {
                        "text":"Sustainable Development Agenda"
                     }
                  ]
               },
               {
                  "text":" as well as the EU implementation of these goals. It also connects directly to major EU initiatives like the "
               },
               {
                  "data":{
                     "url":"https://ec.europa.eu/info/publications/climate-resilient-europe_en"
                  },
                  "type":"link",
                  "children":[
                     {
                        "text":"Mission for a Climate resilient Europe"
                     }
                  ]
               },
               {
                  "text":" and the Union\u2019s "
               },
               {
                  "data":{
                     "url":"https://ec.europa.eu/info/business-economy-euro/banking-and-finance/sustainable-finance_en"
                  },
                  "type":"link",
                  "children":[
                     {
                        "text":"sustainable finance "
                     }
                  ]
               },
               {
                  "text":"agenda (foreseen for be renewed in the second quarter of 2021)."
               }
            ]
         }
      ]
   },
   "16b4bf9c-4279-48f1-8cc0-6c8fc4c2fc9d":{
      "plaintext":"",
      "@type":"slate",
      "value":[
         {
            "type":"p",
            "children":[
               {
                  "text":"The Strategy aims to build a climate resilient society by improving knowledge of climate impacts and adaptation solutions; by stepping up adaptation planning and climate risk assessments; by accelerating adaptation action; and by helping to strengthen climate resilience globally. It pursues three objectives and proposes a range of actions in order to meet them:"
               }
            ]
         }
      ]
   }
}

It's an output that perfectly matches the input html.

There's a big difference in how the newlines are handled in the new code: the Volto code has also been refactored to follow the same algorithm, the blocks-conversion tool is based on old Volto code.

davisagli commented 1 year ago

@tiberiuichim This looks like a bug, and I'll need to check whether it is a regression because I know I have seen it working better than what you showed. Whitespace between block-level elements is supposed to be ignored.

Can you confirm which version of the blocks-conversion-tool image you used? Sometimes I've accidentally run with an old version of the image if the container was created some time ago.

As for the broader point about switching from blocks-conversion-tool to the Python implementation: What I care about is being able to use a mature, well-tested implementation. I don't know if it is better to have only one of those, or to have one each in Node and in Python. I would be open to trying the Python implementation the next time I work on a migration.

davisagli commented 1 year ago

I do think it's kind of funny that the Python guys are working on the Node implementation and the Node guy is working on the Python implementation :)

tiberiuichim commented 1 year ago

I do think it's kind of funny that the Python guys are working on the Node implementation and the Node guy is working on the Python implementation :)

I'm an accidental node guy. My history with Plone started in 2004. I like to document the little bits that I find interesting and I think would help anyone, see my blog archive. https://play.pixelblaster.ro/archive/

But now I make the Volto bug tracker my rant area and this ticket my blogging space.

tiberiuichim commented 1 year ago

Can you confirm which version of the blocks-conversion-tool image you used?

I've used yarn start and run it natively.

pbauer commented 1 year ago

Since I'm doing two migrations to Volto at the moment using the blocks-conversion-tool I give https://github.com/plone/plone.volto/pull/101 a try and compare the results. Will report back.

tiberiuichim commented 1 year ago

@pbauer unfortunately the plone.volto PR is not a complete block migrator. You'll have to create a block generator in python, based on the output of the slate2html. You'll need to generate tables and images by traversing the slate JSON and extracting those nodes. This: https://github.com/plone/blocks-conversion-tool/blob/bc6a215597654b94bff66a514b69efe7b733bc0e/src/converters/fromHtml.js#L95

tiberiuichim commented 1 year ago

or you can pre-parse the html, extract images and tables and use the html2slate on the other "child fragments"

pbauer commented 1 year ago

The conversion from html to slate looks good to me but the fact that tables and images are not converted to the right blocks is a deal-breaker for me at the moment. I'll keep using the blocks-conversion-tool right now since I lack time to implementing the missing converters.

Here is my slighty modified plone.volto.browser.migrate_richtext.get_blocks_from_richtext.

from plone.volto.slate.html2slate import text_to_slate

def get_blocks_from_richtext(text, service_url=None, slate=True):
    blocks = {}
    uuids = []
    slate_data = text_to_slate(text)
    for item in slate_data:
        uuid = str(uuid4())
        uuids.append(uuid)
        block = {"@type": "slate"}
        block["value"] = [item]
        block["plaintext"] = ""  # TODO
        blocks[uuid] = block
    return blocks, uuids
tiberiuichim commented 1 year ago

@pbauer thanks for giving it a try!