torchbox / wagtail-wordpress-import

A package for Wagtail CMS to import WordPress blog content from an XML file into Wagtail
MIT License
41 stars 15 forks source link

`replace_with` fails with `Cannot replace one element with another when the element to be replaced is not part of a tree` error #141

Open fabienheureux opened 2 years ago

fabienheureux commented 2 years ago

Sorry about the title, I am not sure how to name this issue as it is very specific and related to a specific post. I am still investigating where it comes from but thought I could post it as some of you might have a better idea of the source of the issue.

The importer fails on this specific item ```xml Limonadier https://limonadier.net Débit de beaux sons Fri, 24 Dec 2021 12:02:30 +0000 fr-FR 1.2 https://limonadier.net https://limonadier.net https://wordpress.org/?v=4.9.3 Lucy Dacus — No Burden (debut LP) https://limonadier.net/lucy-dacus-no-burden-debut-lp/ Fri, 01 Apr 2016 13:45:31 +0000 http://limonadier.net?p=38158 No Burden by Lucy Dacus  ]]> 38158 0 0 0 ```
Here is the traceback ```python Traceback (most recent call last): File "manage.py", line 10, in execute_from_command_line(sys.argv) File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line utility.execute() File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 395, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 330, in run_from_argv self.execute(*args, **cmd_options) File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 371, in execute output = self.handle(*args, **options) File "/usr/local/src/wagtail-wordpress-import/wagtail_wordpress_import/management/commands/import_xml.py", line 70, in handle importer.run( File "/usr/local/src/wagtail-wordpress-import/wagtail_wordpress_import/importers/wordpress.py", line 113, in run wp_post_id=wordpress_item.cleaned_data.get("wp_post_id") File "/usr/local/lib/python3.8/functools.py", line 967, in __get__ val = self.func(instance) File "/usr/local/src/wagtail-wordpress-import/wagtail_wordpress_import/importers/wordpress.py", line 518, in cleaned_data "body": self.body_stream_field(self.prefilter_content(self.raw_body)), File "/usr/local/src/wagtail-wordpress-import/wagtail_wordpress_import/importers/wordpress.py", line 436, in body_stream_field builder.promote_child_tags() File "/usr/local/src/wagtail-wordpress-import/wagtail_wordpress_import/block_builder.py", line 58, in promote_child_tags promotee.parent.replace_with(promotee) File "/usr/local/lib/python3.8/site-packages/bs4/element.py", line 266, in replace_with raise ValueError( ValueError: Cannot replace one element with another when the element to be replaced is not part of a tree. ```
And here are some logs I added in the promote_child_tags method ```python Promotee Parent


Parent name p Removee tags ['p', 'div', 'span'] ```

Details

Wagtail v2.15.2 I installed wagtail-wordpress-import from the main branch yesterday, so I am using the latest version of this codebase.

fabienheureux commented 2 years ago

Something odd I noticed is the fact that <p> <br /> is the "parent" whereas these tags are not even in the original xml :thinking:

nickmoreton commented 2 years ago

Thanks for the report.

I tried importing your XML snippet and it works OK, without console errors or warnings. I get a single imported page as expected with 2 'raw_html` blocks, each containing the iframe.

Something odd I noticed is the fact that <p> <br /> is the "parent" whereas these tags are not even in the original xml 🤔

The <p> <br /> tags are added in the bleach process.