yireo / Yireo_NextGenImages

44 stars 26 forks source link

replaceImagesInHtml skip images #16

Closed ssstankiewicz closed 6 months ago

ssstankiewicz commented 2 years ago

Hi,

In my opinion, I found an error in \Yireo\NextGenImages\Image\HtmlReplacer::replaceImagesInHtml. If we put several tags in a row in html, the method ignores even images(tags). To be more precise, this is omitted in the regex.

In pagebuilder: page_builder_structure

In debugger: webp_phpstorm

In browser: webp_insepctor

By the way, is it possible to get such a structure (separate photos for mobile and desktop) from pagebuilder? picture_mobile_desktop

mts527 commented 2 years ago

Edit: Having the same issue where odd images only get's converted and wrapper with <picture> tag. I'm using the latest module version.

this

<img src="/media/wysiwyg/1.png" alt=""/>
<img src="/media/wysiwyg/2.png" alt=""/>
<img src="/media/wysiwyg/3.png" alt=""/>
<img src="/media/wysiwyg/4.png" alt=""/>

will be converted to

<picture>
    <source type="image/webp" srcset="/media/wysiwyg/1.webp">
    <source type="image/png" srcset="/media/wysiwyg/1.png">
    <img src="/media/wysiwyg/1.png" alt=""/>
</picture>
<img src="/media/wysiwyg/2.png" alt=""/>
<picture>
    <source type="image/webp" srcset="/media/wysiwyg/3.webp">
    <source type="image/png" srcset="/media/wysiwyg/3.png">
    <img src="/media/wysiwyg/3.png" alt=""/>
</picture>
<img src="/media/wysiwyg/4.png" alt=""/>
mts527 commented 2 years ago

Update: I've disabled HtmlReplacer and created a custom plugin for Magento\Framework\View\LayoutInterface with a DOM parser - DOMDocument to modify the image nodes. Seems to be working fine for now and more reliable than using regex.

jissereitsma commented 2 years ago

Would that code be material for a PR or perhaps sharing the code in the first place?

mts527 commented 2 years ago

Don't know if it's PR ready candidate yet as it's in a testing faze. As for sharing the code - sure :)

Vendor/Module/etc/di.xml

<?xml version="1.0"?>
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="urn:magento:framework:ObjectManager/etc/config.xsd">
    <type name="Magento\Framework\View\LayoutInterface">
        <plugin name="yireo_nextgenimages_replace_tags" disabled="true"/>
    </type>
    <type name="Magento\Framework\View\LayoutInterface">
        <plugin name="vendor_module_layout_update_images_attributes" type="Vendor\Module\Plugin\Layout\UpdateImagesAttributes" disabled="false"/>
    </type>
</config>

Vendor\Module\Plugin\Layout\UpdateImagesAttributes.php

<?php

namespace Vendor\Module\Plugin\Layout;

class UpdateImagesAttributes
{
    /**
     * @var \Vendor\Module\Helper\Data
     */
    private \Vendor\Module\Helper\Data $_helper;

    /**
     * @param \Vendor\Module\Helper\Data $helper
     */
    public function __construct(
        \Vendor\Module\Helper\Data $helper
    ) {
        $this->_helper = $helper;
    }

    /**
     * @param \Magento\Framework\View\LayoutInterface $layout
     * @param string                                  $output
     *
     * @return string
     */
    public function afterGetOutput(
        \Magento\Framework\View\LayoutInterface $layout,
        string                                  $output
    ) : string {
        return $this->_helper->getWebpHtmlOutput($output);
    }
}

Vendor\Module\Helper\Data.php

<?php

namespace Vendor\Module\Helper;

class Data
{
    /**
     * Allowed image extensions
     */
    const ALLOWED_PATH_EXTENSIONS = ['jpg', 'jpeg', 'png'];

    /**
     * @var \Vendor\Module\Model\Config
     */
    private \Vendor\Module\Model\Config $_config;

    /**
     * @var \Vendor\Module\Model\WebP\Convertor
     */
    private \Vendor\Module\Model\WebP\Convertor $_convertor;

    /**
     * @param \Vendor\Module\Model\Config         $config
     * @param \Vendor\Module\Model\WebP\Convertor $convertor
     */
    public function __construct(
        \Vendor\Module\Model\Config         $config,
        \Vendor\Module\Model\WebP\Convertor $convertor
    ) {
        $this->_config = $config;
        $this->_convertor = $convertor;
    }

    /**
     * Parses partial HTML only
     * @param string $html
     *
     * @return string
     */
    public function getWebpHtmlOutput(string $html) : string
    {   
        if ($this->_config->isWebPEnabled()) {
            $output = '';
            $document = new \DOMDocument();
            // Create a temporary wrapper element to parse partial HTML and avoid breaking the output
            $html = '<div>' . $html . '</div>';
            // Enable internal errors before loading the HTML to avoid warnings for unsupported HTML5 tags
            libxml_use_internal_errors(true);
            // Load HTML by converting the encoding to UTF-8
            $document->loadHTML(
                mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'),
                LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED
            );
            // Disable internal errors after we have loaded the HTML
            libxml_clear_errors();
            libxml_use_internal_errors(false);
            // Set the default encoding
            $document->encoding = 'utf-8';

            // Update image attributes within real DOM object
            $images = $document->getElementsByTagName('img');
            foreach ($images as $image) {
                $extension = pathinfo($image->getAttribute('src'), PATHINFO_EXTENSION);

                if (in_array($extension, self::ALLOWED_PATH_EXTENSIONS)) {
                    $this->_createPictureTag($document, $image, $extension, $image->getAttribute('src'));
                }
            }

            // Create temporary root and save HTML from it to the final output
            $root = $document->documentElement;
            foreach ($root->childNodes as $childNode) {
                // Save child node to document from temporary root
                $output .= $document->saveHTML($childNode);
            }
        } else {
            $output = $html;
        }

        return $output;
    }

    /**
     * @param $document
     * @param $image
     * @param $extension
     * @param $src
     */
    private function _createPictureTag(
        &$document,
        &$image,
        $extension,
        $src
    ) {
        $picture = $document->createElement('picture');
        $sourceWebp = $document->createElement('source');
        $sourceOriginal = $document->createElement('source');

        $sourceWebp->setAttribute('type', 'image/webp');
        $sourceWebp->setAttribute('loading', 'lazy');
        $sourceWebp->setAttribute('srcset', $this->_convertor->getConvertedImageUrl($src));

        $sourceOriginal->setAttribute('type', 'image/' . $extension);
        $sourceOriginal->setAttribute('loading', 'lazy');
        $sourceOriginal->setAttribute('srcset', $src);

        $picture->appendChild($sourceWebp->cloneNode());
        $picture->appendChild($sourceOriginal->cloneNode());
        $picture->appendChild($image->cloneNode());

        $image->parentNode->replaceChild($picture, $image);
    }
}
ssstankiewicz commented 2 years ago

@jissereitsma Are you working on improving this plugin? Do you have an idea how to do it? Will you try to fix this regex or will you rebuild this part and drop regexp?

jissereitsma commented 2 years ago

Currently I'm not working on this issue. Too much other work is lying around. I didn't look into it yet either.

jissereitsma commented 2 years ago

I dived into this myself to see if this issue (3 images on a row being misinterpreted) was something I could produce. I could quite easily and have added a unit test for this to describe how it should work. This also means that I'm keen to see this bug fixed.

However, I also can see that some people have referred to the usage of DOMDocument as a way to fix this. It's not. First of all, the bug is a mismatch in the regular expression and therefore fixing the regular expression is easier than implementing DOMDocument. Second of all, implementing DOMDocument anyway quickly leads into other issues. For instance, this approach requires the entire HTML document to be kind of valid (HTML4, HTML5, XHTML), while I've seen cases of half-broken shops that work in the browser anyway and therefore need to be supported by extension vendors. And in my case, the usage of DOMDocument quickly broke things because HTML5 tags like main not being recognized. Also, replacing the original tags becomes problemetic when you ONLY want to replace the original tags and not restructure the entire HTML document. The solution proposed by @mts527 maybe works for him, but has many possible things that are changed outside of the scope of images. Plus the new picture tag is built using DOMDocument without an override of PHTML anymore. This, while this module is really only allowed to replace images, hence the name.

I'm still experimenting with this, but I wanted to let you know my findings already.

jissereitsma commented 2 years ago

I've done a refactoring based on DOMDocument anyway, as to remove the complex regex. However, I've kept the main structure for checking many other things. Plus, as mentioned, DOMDocument is definitely not used to generate an entirely new HTML page, which breaks numerous other things. Instead, there is a simple regex to add identifiers and a search-and-replace for those identifiers later. DOMDocument is used for searching, not replacing.

Version 0.3.10 now ships with this new approach and it should fix the issue of the img-tags on a row not working properly. Please let me know if it does.

mts527 commented 2 years ago

Agree @jissereitsma, DOMDocument can potentially create other issues as I've noticed few days ago. It breaks inline scripts that concatenate a string with HTML and then append it to DOM. I would also suggest using native module's way with regex.

jissereitsma commented 6 months ago

I'm closing this ticket due to inactivity.