torchbox / wagtail-wordpress-import

A package for Wagtail CMS to import WordPress blog content from an XML file into Wagtail
MIT License
44 stars 16 forks source link

#53 Use single style rule #48

Closed nickmoreton closed 3 years ago

nickmoreton commented 3 years ago

Ticket: https://projects.torchbox.com/projects/wordpress-to-wagtail-importer-package/tickets/53

This pr changes the way we now parse the html inline style attributes as well as some html tags.

It now uses individual style rules, rather than the complete style string, therefore we can transform inline styles to html tags to represent the style by matching only one style rule from a complete style string. It also transforms some HTML tags for better Wagtail compatibility.

The config for searching styles can be overridden by a developer in their own Wagtail settings. It's intended that a developer will be able to provide their own methods to transform the styles and tags.

We intend to provide a basic/most common set of rules in the package.

Example:

<span style="font-weight: bold; font-style: italic;">text</span> would be transformed to <b><i>text</i></b> when used to populate the wagtail streamfield blocks

Note

There's likely more work to do when creating blocks based on the parsed body content. At that point the style attr will still be in place for us to use. Some style attrs such as float and text-align are transformed to classes. This is one piece that we will use later when building out the required blocks for the stream field. e.g. when setting image alignment in a rich text block.

Currently the style attrs are removed by bleach at the last stage. I'm undecided on the order of where bleach should be used in the end but suspect we may need 2 bleach operations with different settings for each call.

Docs: they are slightly out of date now but I don't want to update them until this PR is merged to avoid conflicts. https://github.com/torchbox/wagtail-wordpress-import/pull/36

nickmoreton commented 3 years ago

Thanks Nick, appreciate you having a look.

For the tests, at the moment the normalize_style_attrs and filter_transform_inline_styles_to_tags are used in wagtail-wordpress-import/wagtail_wordpress_import/test/tests/test_wordpress_item.py but not directly. I agree they should be pulled out to separate tests.

The next ticket #22 - Filter the blocks | Setup initial filter converters has a draft PR here: https://github.com/torchbox/wagtail-wordpress-import/pull/49 in which I have added tests for the work and provided better fixtures as well as a test app, these fixtures are used for all the tests. I will make more sense to break out the test you mention on a separate ticket. I think it'll avoid having to deal with conflicts later.

Would you be OK with that?

nimasmi commented 3 years ago

I will make more sense to break out the test you mention on a separate ticket. I think it'll avoid having to deal with conflicts later.

Would you be OK with that?

Yes I agree.

nickmoreton commented 3 years ago

Hi Nick, I have worked through the comments and pushed the changes. Many thanks.

nickmoreton commented 3 years ago

Thanks Nick. Yes the test will be reintroduced in codebase ticket 22