magento / magento2

Prior to making any Submission(s), you must sign an Adobe Contributor License Agreement, available here at: https://opensource.adobe.com/cla.html. All Submissions you make to Adobe Inc. and its affiliates, assigns and subsidiaries (collectively “Adobe”) are subject to the terms of the Adobe Contributor License Agreement.
http://www.magento.com
Open Software License 3.0
11.48k stars 9.29k forks source link

PageCache: async rendering of blocks can corrupt layout cache #8554

Closed bka closed 6 years ago

bka commented 7 years ago

Hi folks,

I disovered a potential bug regarding the layout cache which leads to a blank page. If this happens, customers may see a white page without any hmtl inside the body tags. It might be related to #6942, although there is no php-fpm or libxml in the game here.

Preconditions

  1. Tested on Magento 2.1.0 CE and 2.1.4 CE

Steps to reproduce

  1. bin/magento cache:clean
  2. curl -gI "http://magento2.local/page_cache/block/render/?ajax=1&blocks=[%22header.login%22]&handles=[%22default%22%2C%22cms_index_index%22%2C%22cms_page_view%22%2C%22cms_index_index_id_home%22]&originalRequest=a&version="
  3. curl -s http://magento2.local | grep "<body" -A 3 -B 3

Expected result

  1. Body tag should still contain some data

Actual result

  1. Generated DOM has an empty body like this
<body data-container="body" data-mage-init='{"loaderAjax": {}, "loader": { "icon": "http://magento2.local/pub/static/version1487104262/frontend/Magento/luma/de_DE/images/loader-2.gif"}}' class="cms-index-index page-layout-1column">
</body>

auswahl_092

Discussion

This ajax call seems to destroy the layout cache. If it is not present, structure which gets loaded and unserialized in lib/internal/Magento/Framework/View/Layout.php:generateElements from cache does not contain any root element anymore:

$ awk -F "root" '{print NF-1}' var/cache/mage--0/mage---792_STRUCTURE_LAYOUT_FRONTEND_STORE1_2C7461AF04BB11FE209659D7147D2C579
0
0

Cleaning the cache again and loading the front page normally generates a valid layout cache, where root is present:

$ awk -F "root" '{print NF-1}' var/cache/mage--0/mage---792_STRUCTURE_LAYOUT_FRONTEND_STORE1_2C7461AF04BB11FE209659D7147D2C579
0
98

I think it has something do to with the used handles [default, cms_index_index, cms_page_view, cms_index_index_id_home] which are used for generating the cache key in getCacheId. If there is no cache present, the page cache block just writes its content into the layout cache although there is no root layout used.

I need to dive deeper into this issue. It normally should not happen, because the DOM should get loaded first. And subsequent ajax calls should always have a layout cache present. But if cache gets cleared manually and there is already an ajax call on its way it can lead to a broken layout cache.

bka commented 7 years ago

Ok, I did some debugging. It seems to only affect CMS pages and is related to the way how Magento_PageCache module works. This module renders a script into the DOM to fetch some blocks asynchronously.

<script type="text/x-magento-init">
    {
        "body": {
            "pageCache": {"url":"http:\/\/magento2.local\/index.php\/page_cache\/block\/render\/","handles":["default","cms_index_index","cms_page_view","cms_index_index_id_home"],"originalRequest":{"route":"cms","controller":"index","action":"index","uri":"\/index.php\/"},"versionCookieName":"private_content_version"}        }
    }
</script>

This is done in PageCache/view/frontend/web/js/page-cache.js. This call hits PageCache/Controller/Block/Render.php and PageCache/Controller/Block.php:_getBlocks triggers the loading of the layout:

$this->_view->loadLayout($handles, true, true, false);

However, because of the way how lib/internal/Magento/Framework/View/Model/Layout/Merge.php:_fetchRecursiveUpdates works, the pageLayout will be set to admin-1column defined in app/code/Magento/Theme/view/adminhtml/page_layout/admin-1column.xml. Unfortunately, in this case lib/internal/Magento/Framework/View/Page/Layout/Reader.php:read just gets an empty layout <layout/> for $xml.

public function read(Layout\Reader\Context $readerContext, $pageLayout)
{
    $this->getPageLayoutMerge()->load($pageLayout);
    $xml = $this->getPageLayoutMerge()->asSimplexml();
    $this->reader->interpret($readerContext, $xml);
}

I think the layout is not loaded because our area in this case is frontend not adminhtml. So this layout won't be loaded in this case. And therefore there are some critical elements missing inside scheduledStructure afterwards, like e.g. the root container.

Now, the bad thing happens. This structure gets stored into the cache, if there is none. This is done in lib/internal/Magento/Framework/View/Layout.php:generateElements.

    $cacheId = 'structure_' . $this->getUpdate()->getCacheId();
    $result = $this->cache->load($cacheId);
    if ($result) {
        $this->readerContext = unserialize($result);
    } else {
        \Magento\Framework\Profiler::start('build_structure');
        $this->readerPool->interpret($this->getReaderContext(), $this->getNode());
        \Magento\Framework\Profiler::stop('build_structure');
        $this->cache->save(serialize($this->getReaderContext()), $cacheId, $this->getUpdate()->getHandles());
    }

The $cacheId will be the same as for a normal request because it is based on used handles. But remember, we have missing elements in the structure like root. Next time, somebody visits the page, he will get a blank page without body, because this broken layout structure has been loaded from the cache.

broken

For normal requests on CMS pages, this is not happening, because the controller sets the pageLayout explicitly in app/code/Magento/Cms/Helper/Page.php:setLayoutType before the layout is loaded. This is usually 1column defined in app/code/Magento/Theme/view/frontend/page_layout/1column.xml and can then easily be loaded by lib/internal/Magento/Framework/View/Page/Layout/Reader.php.

valid

I would really appreciate any advice on how to fix this.

erikhansen commented 7 years ago

@bka Thanks for sharing your findings. I'm curious whether you spent time to find a solution to this issue?

I'm experiencing a very similar issue with an empty homepage (where the body tag is empty). Until I read your issue description, I wasn't able to consistently reproduce the issue, but I'm now able to reproduce it if I follow these steps:

  1. Flush Magento cache:
    bin/magento cache:flush
  2. Run a curl request like this to request the ESI (Edge Side Include) of the catalog nav that Varnish requests from Magento (note: This request will work even on environments that don't have Varnish configured, so it's possible to reproduce this in environments using the built-in full page cache):
    curl -s http://example.dev/page_cache/block/esi/blocks/%5B%22catalog.topnav%22%5D/handles/%5B%22default%22,%22cms_index_index%22,%22cms_page_view%22,%22cms_index_index_id_home%22%5D/

    The content returned by this curl request will be empty and will corrupt the layout cache, per your findings above.

  3. Run a curl request to request the homepage content:
    curl -s http://example.dev

    The content returned by this curl request will include an empty body, similar to your screenshot in your original post.

If I reverse steps 2 and 3, the homepage content will contain content like expected, and the curl request for the menu/navigation content will also contain content.

On a side note, I initially suspected Varnish was at fault with this issue as I know that an improper Varnish configuration can cause the catalog navigation to go missing. I've been experiencing issues with both a blank homepage and then sometimes the navigation is missing (but the rest of the homepage content is loaded). I was able to figure out the http://example.dev/page_cache/block/esi/blocks… url used in step 2 by using the Gist that was shared in this comment. After finding this issue, I tried requesting that ESI url in the manner you described and that's how I was able to consistently reproduce the issue.

I was curious to see what combination of handles were successful vs not, so I ran varnishlog while loading a category page to get an example query that did work. Here is what I found:

Broken requests:

curl -s http://example.dev/page_cache/block/esi/blocks/%5B%22catalog.topnav%22%5D/handles/%5B%22default%22,%22cms_index_index%22,%22cms_page_view%22,%22cms_index_index_id_home%22%5D/
curl -s http://example.dev/page_cache/block/esi/blocks/%5B%22catalog.topnav%22%5D/handles/%5B%22default%22%5D/
curl -s http://example.dev/page_cache/block/esi/blocks/%5B%22catalog.topnav%22%5D/handles/%5B%22cms_index_index%22,%22cms_page_view%22,%22cms_index_index_id_home%22%5D/

Working requests:

curl -s http://example.dev/page_cache/block/esi/blocks/%5B%22catalog.topnav%22%5D/handles/%5B%22default%22,%22catalog_product_view%22,%22catalog_product_view_id_38640%22,%22catalog_product_view_sku_XX-XXX-XXX%22,%22catalog_product_view_type_simple%22%5D/
curl -s http://example.dev/page_cache/block/esi/blocks/%5B%22catalog.topnav%22%5D/handles/%5B%22default%22,%22catalog_product_view%22%5D/
bka commented 7 years ago

Hi @erikhansen,

thanks for your feedback. I have some additional insights. This issue relates to usage of $_isScopePrivate. Devdocs say it wont work properly. This also explains why this issue did not appear on standard luma. Indeed, we had some blocks with isScopePrivate.

Having a block defined with isScopePrivate will trigger async loading of blocks also known as Varnish Hole Punching. Sad thing is, if this Block is present on the homepage, indicated by handles default, cms_index_index, cms_page_view, cms_index_index_id_home it corrupts the layout cache if it is empty.

Related source files are:

I fixed this by patching Module_PageCache/Controller/Block.php to include an additional pseudo handle which changes the generated cache id like this:

/**
 * Get blocks from layout by handles
 *
 * @return array [\Element\BlockInterface]
 */
protected function _getBlocks()
{
    $blocks = $this->getRequest()->getParam('blocks', '');
    $handles = $this->getRequest()->getParam('handles', '');

    if (!$handles || !$blocks) {
        return [];
    }
    $blocks = json_decode($blocks);
    $handles = json_decode($handles);

    // fix for layout corruption bug #8554
    if (is_array($handles)) {
        $handles[] = 'fpc_block';
    }

    $this->_view->loadLayout($handles, true, true, false);
    $data = [];

    $layout = $this->_view->getLayout();
    foreach ($blocks as $blockName) {
        $blockInstance = $layout->getBlock($blockName);
        if (is_object($blockInstance)) {
            $data[$blockName] = $blockInstance;
        }
    }

    return $data;
}
erikhansen commented 7 years ago

@bka - Thanks for your feedback. While your workaround might fix your specific manifestation of this issue, it doesn't fix mine. And I don't think my manifestation of the issue has anything to do with $_isScopePrivate. I'll post here if I find the solution to my problem.

@Magento - I was able to easily reproduce my issue on a vanilla Magento 2 install using the steps I outlined in my comment above.

erikhansen commented 7 years ago

@bka - I have a solution to the issue. You can reference the fix in this commit: https://github.com/magento/magento2/commit/99bb170a805f3e9d0ab5c9b9bb801f739ac161c1

Try applying this patch to your environment and see if the issue goes away:

diff --git a/vendor/magento/module-ui/view/base/layout/default.xml b/vendor/magento/module-ui/view/base/layout/default.xml
index 7f2efbd..64d5f14 100644
--- a/vendor/magento/module-ui/view/base/layout/default.xml
+++ b/vendor/magento/module-ui/view/base/layout/default.xml
@@ -5,7 +5,7 @@
  * See COPYING.txt for license details.
  */
 -->
-<page xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" layout="admin-1column" xsi:noNamespaceSchemaLocation="urn:magento:framework:View/Layout/etc/page_configuration.xsd">
+<page xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="urn:magento:framework:View/Layout/etc/page_configuration.xsd">
     <body>
         <referenceContainer name="after.body.start">
             <block class="Magento\Ui\Block\Logger" name="logger" template="Magento_Ui::logger.phtml"/>

If the issue does go away, I'd recommend closing this issue.

erikhansen commented 7 years ago

@bka Have you had a chance to test my suggested patch yet?

erikhansen commented 7 years ago

@bka Any update on this?

andreymoskvenkov commented 7 years ago

The root cause of this issue is that the "pageLayout" parameter for the layout builder is missing. However before starting blocks and XML generation Magento adds additional blocks to the list based on selected pageLayout: 1column, 2columns or empty. By default for the most of the pages the layout is: 1column. As the result: almost all default pages are working as expected. But once you customize your theme a little bit, using "move" actions, the full page cache becomes broken. The solution is pretty complex: I have overridden Magento\Framework\View\Page\Builder, Magento\Framework\View\Layout, Magento\PageCache\Observer\ProcessLayoutRenderElement and Magento\PageCache\Controller\Block\Esi, by adding a new parameter to the URL and to the controller, named: "page_layout" and then I set it into the view in the Esi controller. Works for me.

cherreman commented 7 years ago

@andreymoskvenkov Can you provide a patch so we can verify your solution?

andreymoskvenkov commented 7 years ago

Sorry, but I can provide only a tip, which I have already done. Tips are free :)

cherreman commented 7 years ago

@andreymoskvenkov C'mon man, this is opensource. Take a little, give a little ;)

All kidding aside, I appreciate your tip, but I'm sure your patch would be appreciated even more and it could be validated. If you can't share it for whatever reason, I understand.

erikhansen commented 7 years ago

@cherreman Did you happen to try the patch I included in my comment on February 28th?

andreymoskvenkov commented 7 years ago

Yeah, there are many reasons of not sharing the code, one of them is - NDA of the company, where I am working for. We have fixed a lot of core bugs of M2 internally and this is one of the most difficult to find. And why have I posted a tip here is - because this issue number #8554 has also given me a tip about layout type such as "1column", "empty", etc. I was debugging it almost whole day and was starting to become a crazy, because I could not understand: why in the Layout object's "_blocks" property the list of generated blocks are only a few of them in page_cache controller, however in CMS_index controller the list of blocks in "_blocks" property is big: about 79 blocks there. And at the same time the final XML in both actions is almost the same! The cause of that was: because Magento adds some blocks to the "_blocks" property manually in a separate method depending on what "pageLayout" is set. As the result: in page_cache action no page layout was passed and no page layout was set. I don't need a validation of my fix :) It is on Production already and it works for us 100% :)

miguelbalparda commented 7 years ago

https://github.com/magento/magento2/pull/9560 seems to fix this issue, can you try that fix and report back? I'd like to merge that PR but I will need some help from you :smile:

andreymoskvenkov commented 7 years ago

:-D Sorry, but no, this PR will not fix the issue. In the fix need to Pass "page_layout" code into an API request together with the list of applied handlers on a page. Your fix just adds an additional handler, which is used only in the FullPageCache's controller.

miguelbalparda commented 7 years ago

@adrian-martinez-interactiv4 can you chime in here?

adrian-martinez-interactiv4 commented 7 years ago

@andreymoskvenkov If I have understood well, you are proposing to include the additional handle in the ajax request, instead of putting it within the block page cache controller. I don't know if I'm missing something, could you clarify a bit your previous explanation pls? Thanks in advance.

andreymoskvenkov commented 7 years ago

No, I propose to add an additional parameter to the PageCache's URL: 'page_layout', where to put what Layout type was used on the current page and then to add a processing of this parameter in the ESI controller, which should just take the value from that parameter from the request and to set it into Layout object. Then everything will work correctly. And this is not easy btw, because of crazy Magento 2 architecture (from what I see the core developers instead of thinking more deeply, they just split responsibilities between classes randomly and apply "protected", "public" and "private" keywords also randomly :) ). It is not a "handler", it is an additional characteristics (or parameter, or property) of Layout entity.

adrian-martinez-interactiv4 commented 7 years ago

Well, I'm been doing some research about this issue. I'll be explaining each point later, but long story short:

  1. This issue is still happening at CE 2.1.6, but it seems to be already solved in develop branch.
  2. Actually, issue described at PR #9560 is not the same as the one described here; it references an issue related to layout updates related with the cms page not being applied. This issue #8554 and #9050 have been referenced there because, indirectly, also solved this problem.
  3. Also, to solve issue described in #9560, as it's related to the layout handles used in getCacheId, either an additional unique layout handle should be used to avoid wrong layout cache being generated, or, as you propose, an additional property should be added to the layout, to be used (if defined) in getCacheId method, so it can be used anywhere else.

About point 1: As said before, I'm able to reproduce it in CE 2.1.6, this one is ok:

captura de pantalla 2017-05-16 a las 22 20 30

And this one shows the empty body:

captura de pantalla 2017-05-16 a las 22 21 23

Same steps reproduced at 2.2.0-dev, first shows non-empty body, as expected:

captura de pantalla 2017-05-16 a las 22 47 31

But the same steps that resulted in an empty body in CE 2.1.6, return non-empty body in 2.2.0-dev:

captura de pantalla 2017-05-16 a las 22 48 15

Without going too deep in what commits are involved in solving this issue, what seems clear is that this issue, as long as #9050 , that I consider a duplicate of this one, are already solved.

About point 2: Unfortunately, #9560 is not solved yet. As explained in comment https://github.com/magento/magento2/pull/9560#issuecomment-300349007, page structure is cached based on layout handles, and it's not correctly generated due to cms page not being loaded to include its layout updates within the cached page structure. This leads us to point 3.

About point 3: At this point, I see two ways to solve the issue, as described above. I chose the additional / unique layout handle solution, as the easiest and least intrusive option. In the other hand, I think it would be a good idea to have a custom property in the layout that may be used to generate the cache id if defined, along with layout handles. This would make it possible to generate unique ids for layout cache if specified from anywhere. I'm giving a try to this approach, but I see no reason to include a new param in the request, as long as the controller should be responsible to set the property in the layout to ensure a different, unique layout cache id is generated for that request. Anyway, I'm giving a try on that too.

As this issue seems to be solved, I'll continue giving feedback about #9560 in the thread of that PR.

andreymoskvenkov commented 7 years ago

:-D Ok, continue adding new handlers for FPC, up to you! I also don't see no reason in adding a new param to the request, until it was a core Magento approach how they build the final layout XML: they phisically set this layout type into Layout object and then even Create some blocks from PHP code, if a layout type was set. That's why any other fixes, such as making CacheID unique, etc, do Not fix he issue Completely for ALL cases. I have successfully fixed it for our client in March and we continue shareing this fix internally from project to project. But you may continue having fun in fixing other bugs many times instead of fixing the root cause once. :-D

miguelbalparda commented 7 years ago

@andreymoskvenkov can you clarify what "we have an internal patch" means? If you have a patch for this or other issues, would you kind sharing them for others to benefit? Thanks!

andreymoskvenkov commented 7 years ago

It meas, that I have fixed this issue and wrapped it up as a separate module. However as I said earlier, I won't share it. Magento Community has helped me to find the root cause of the issue and that's why I was able to fix it. That's why I want to thank the community for this and share the root cause of it. However I will not share the fix, because if after my notes you cannot fix the issue like I fixed it, then you are useless as Magento specialists and will not be able to provide high quality assistance to your clients on Magento 2 platform. So the goal of my posts here is: to introduce an exercise, which is based on one of real problems, which faces a Magento developer almost every day. I want to make the community stronger. Just providing a final fix will not improve the level of community developers.

miguelbalparda commented 7 years ago

That is not how any of this works. If you really think you are setting the bar higher let me tell you that you are doing the complete opposite, leaving a bug open because you think people should be "smart" like you to fix it. This is how you make a community weaker because you say that if anyone is not as good as you it should not participate at all and that is completely false. But hey, this is OSS and we deal with this kind of cases every day so chances are someone will probably come and fix it. In the meantime @adrian-martinez-interactiv4 keep up the good work!

renttek commented 7 years ago

if one could see and read the fix, one can learn from it, trying to understand what the problem was and how the solution was build. For me thats a giant part of OSS. If you have a patch and don't want to share it, okay, go your way, but please don't brag (it's nothing more to me) about it.

(And yes, it is about you, since you were implying, that you are good and if one can't do one thing you can, he or she is useless)

fabianPas commented 7 years ago

A proprietary patch? When will this land in Magento EE? :trollface:

maksek commented 7 years ago

The thread was cleaned from not-related to issue comments. Please refrain from negative messaging and be respectful to each other.

bka commented 7 years ago

It seems this issue is fixed in develop. Cannot reproduce it anymore, at least the manifestation described above. It seems the change of layout in module-ui/view/base/layout/default.xml pointed out by @erikhansen did the trick. I dont know about other manifestiations but this one seems to be resolved.

hostep commented 7 years ago

Just to add some more information, I quickly looked in the git history and like @bka said, @erikhansen's proposed solution was already executed on the develop branch in scope of MAGETWO-55341, see: https://github.com/magento/magento2/commit/f42472a3753c8b87969a8e2dde6645525bf797d8 (Not sure if this is the correct and definitive solution though, based on the above comments...)

magento-team commented 7 years ago

Internal ticket to track issue progress: MAGETWO-69018

magento-engcom-team commented 6 years ago

@bka, thank you for your report. The issue is already fixed in develop branch, 2.2.0

magento-team commented 6 years ago

Internal ticket to track issue progress: MAGETWO-80647