salsadigitalauorg / merlin-framework

Merlin - migration framework
GNU General Public License v3.0
16 stars 3 forks source link

Blank attributes cause media processor to fail #66

Closed stooit closed 5 years ago

stooit commented 5 years ago

Describe the bug When media processor encounters an attribute with no value (e.g title in <img title src="/path/to/image.jpg"> it fails to correctly replace the image with an embed snippet.

This is because the str_replace in src/Processor/Media.php is trying to match on markup that is "sanitised", e.g "<img title="" src="/path/to/image.jpg"> so no match is found.

Sample configuration

---
domain: https://www.transportinfrastructurecouncil.gov.au
urls:
  - /council_members

entity_type: standard_page
mappings:
  -
    field: alias
    type: alias
  -
    field: title
    selector: h1
    type: text
    processors:
      nl2br: { }
  -
    field: field_body
    selector: //*[@id="read"]
    type: long_text
    processors:
      -
        processor: media
        type: image-standard_page
        selector: img
        file: src
        name: alt
        data_entity_embed_display: view_mode:media.embed
        data_embed_button: media_entity_embed
      -
        processor: remove_dom
        selector: h1
        xpath: false
      -
        processor: strip_tags
        allowed_tags: <h1><h2><h3><h4><h5><ul><ol><dl><dt><dd><li><p><a><strong><em><cite><blockquote><code><s><span><sup><sub><table><caption><tbody><thead><tfoot><th><td><tr><hr><pre><br><drupal-entity>
        remove_attr:
          - class
          - style

Expected behavior The first image on the page (mccormack.jpg) should appear, but does not and is stripped.

Screenshots imgpsh_mobile_save

Additional context None

derklempner commented 5 years ago

This problem is due to the Symfony lib versions used to ensure compatibility with Drupal in GovCMS.

Specifically, it appears to be an issue with the html5 parser that rewrites the HTML as attribute names only for attributes with empty values. One work around is to force the old behaviour to use the native DOMDocument (which was the only parser available in earlier versions anyway). I have quickly tested this and it builds the embed snippet correctly for the case above.