magento / magento2

Prior to making any Submission(s), you must sign an Adobe Contributor License Agreement, available here at: https://opensource.adobe.com/cla.html. All Submissions you make to Adobe Inc. and its affiliates, assigns and subsidiaries (collectively “Adobe”) are subject to the terms of the Adobe Contributor License Agreement.
http://www.magento.com
Open Software License 3.0
11.57k stars 9.32k forks source link

Unnecessary Redis Session Locking On All HTTP GET Requests - Affecting PWA Studio Concurrent GraphQL Requests #34758

Open mttjohnson opened 2 years ago

mttjohnson commented 2 years ago

Preconditions (*)

  1. Magento 2.3.4-p1
  2. PWA Studio 12.1.0 (potentially optional)
  3. Setup Magento with Redis Sessions

Helpful Insights

Steps to reproduce (*)

Configure Magento

  1. Ensure that default configuration of Redis session locking is enabled by setting disable_locking to 0:
    cat app/etc/env.php | grep disable_locking
    bin/magento -n setup:config:set --session-save-redis-disable-locking=0

Load the PWA storefront home page

  1. Load home page (example: https://syseng-seldon.cldev.io/) which will make multiple graphql calls to the MAGENTO_BACKED_URL
  2. In Chrome Developer Tools on the Network tab you can use a filter to only display requests for the specific domain that are graphql calls domain:syseng-seldon.cldev.io -media -static graphql
  3. Multiple (15) /graphql requests are all sent concurrently to the configured MAGENTO_BACKEND_URL of the PWA app around the same time

Simulate the PWA storefront requests on ANY Magento site

  1. Create a static html file (example: graphqlrepeat.html) in /pub/ of the magento root containing fetch
    
    <!DOCTYPE html>
    <html lang="en">
    <head><title>JavaScript HTTP Requests</title></head>
    <body><script>

var domain = "syseng-seldon.cldev.io";

fetch("https://" + domain + "/graphql?query=query+GetStoreConfigForCarouselEE%7BstoreConfig%7Bid+product_url_suffix+magento_wishlist_general_is_enabled+enable_multiple_wishlists+__typename%7D%7D&operationName=GetStoreConfigForCarouselEE&variables=%7B%7D", { "headers": { "accept": "/", "accept-language": "en-US,en;q=0.9", "authorization": "", "cache-control": "no-cache", "content-type": "application/json", "pragma": "no-cache", "sec-ch-ua": "\" Not A;Brand\";v=\"99\", \"Chromium\";v=\"96\", \"Google Chrome\";v=\"96\"", "sec-ch-ua-mobile": "?0", "sec-ch-ua-platform": "\"macOS\"", "sec-fetch-dest": "empty", "sec-fetch-mode": "cors", "sec-fetch-site": "same-origin", "store": "default", "x-magento-cache-id": "null" }, "referrer": "https://" + domain + "/", "referrerPolicy": "strict-origin-when-cross-origin", "body": null, "method": "GET", "mode": "cors", "credentials": "include" });

2. Replace the domain variable with the domain of the magento site you're testing
3. Load the HTML page containing js that will fetch multiple graphql requests concurrently
4. You can open Chrome Developer Tools on the Network tab and where you see one of the GraphQL calls, you can right click on the request line and "Copy as fetch" to get the javascript fetch statement for making that same request in an html page js inline script.
![image](https://user-images.githubusercontent.com/50849/144677577-f911e868-d190-43a2-ab07-6f22885a89d1.png)

5. You can do this to re-create all the requests for a specific page by adding each fetch() call to the html file, or duplicate the exact same fetch() graphql request (40x) to reproduce the session locking behavior being seen here.

Simulate **SOME** of the expected behavior by globally disabling redis session locking on **ALL** requests
1. Simulate SOME of the expected behavior by globally disabling redis session locking on ALL requests

cat app/etc/env.php | grep disable_locking bin/magento -n setup:config:set --session-save-redis-disable-locking=1


### Expected result (*)
When concurrent GraphQL GET requests are made from a visitor, requests should be able to complete in parallel to keep page load time minimal, **while still allowing requests that require session locking (where important session data is being written) in order to prevent some other request from overwriting data in the session**. Important data getting overwritten in a session can negatively affect critical application behavior.

![image](https://user-images.githubusercontent.com/50849/144677897-742bffe4-5836-436c-b8a8-2a0e14116cb9.png)
![image](https://user-images.githubusercontent.com/50849/144677907-b9a9f722-587d-4547-b335-9151af839f32.png)
![image](https://user-images.githubusercontent.com/50849/144677916-db252082-0fac-46f5-9740-7801db6b95be.png)

In simulating some of the correct behavior with Redis session locking disabled, I was able to load the home page and all 15 graphql requests within a window of 600 ms. There are other pages that may contain many more GraphQL requests where this can be even more important to have concurrent requests complete in parallel.

Home Page GraphQL Measurements - with Session Locking Disabled First started at: 393ms Last started at: 586ms Last duration was: 401ms Last ended at: 586ms + 401ms = 987ms Total window of execution: 586ms + 401ms - 393ms = 594ms GraphQL Requests Total: 15 Page Finished 1020ms

![image](https://user-images.githubusercontent.com/50849/144677949-5d012ef8-0459-484c-b459-b83803e173f3.png)
![image](https://user-images.githubusercontent.com/50849/144677966-cb6c501e-2221-4adc-8fde-579b75a94731.png)

Looking at a waterfall of how the concurrent GraphQL requests complete, can reveal that multiple requests are completing at or within close to the same time.

![image](https://user-images.githubusercontent.com/50849/144677992-7afdb257-c485-4155-bc9d-6e6e4113d25f.png)

### Actual result (*)
With Redis session locking enabled which is the default and recommended safe behavior for redis session configs, the concurrent requests queue up, each waiting in sequence for a redis session lock to clear before the next request is able to complete.

![image](https://user-images.githubusercontent.com/50849/144678048-a3fe999a-2e46-4531-9f84-1758c289cc42.png)
![image](https://user-images.githubusercontent.com/50849/144678068-626b005e-0386-461b-8035-6528be2838cb.png)
![image](https://user-images.githubusercontent.com/50849/144678079-61a02424-1b16-494c-b62e-b6d944d287c5.png)

Home Page GraphQL Measurements - with Session Locking Disabled First started at: 313ms Last started at: 323ms Last duration was: 2800ms Last ended at: 323ms + 2800ms = 3123ms Total window of execution: 323ms + 2800ms - 313ms = 2810ms GraphQL Requests Total: 15 Page Finished 3120ms



![image](https://user-images.githubusercontent.com/50849/144678100-e386180d-8e99-40fc-a6fb-0bba764abf86.png)
![image](https://user-images.githubusercontent.com/50849/144678121-86ccca87-b5fd-4e1d-9ef7-1fb98f2f1b16.png)
![image](https://user-images.githubusercontent.com/50849/144678130-548a9ddd-1995-467e-9099-f1102f1537bf.png)

This makes it look like several of the graphql requests are taking an excessive amount of time to complete, while others completed in less time, but while these requests started close to the same time, they spent a lot of time waiting for session locking to clear, resulting in requests being completed in sequence rather than in parallel.

![image](https://user-images.githubusercontent.com/50849/144678160-fd8aa39b-7199-4173-8711-2111944f4cd2.png)
![image](https://user-images.githubusercontent.com/50849/144678175-80ea40c4-2683-41d2-89b7-c1bc78963d03.png)

### Cause of Behavior
With the PWA Studio (Client Side React App) running as the frontend "storefront" of Magento and sending GraphQL calls to the backend of Magento there are **different behavioral patterns** in how requests are sent to and processed by the web server from how we have been seeing interactions when using the Magento "theme" as the frontend.

The PWA sends multiple concurrent AJAX calls to `/graphql` end points on the web server, and these requests are all processed by the Magento backend PHP application. It turns out that **Magento architecture has it creating a session and locking that session regardless of the type of request** or response being issued.

Some `/graphql` requests are able to be cached by Varnish and because **cached requests in Varnish do not execute code and therefore do not interact with redis sessions**. This can mask the fact that each request that hits the backend will lock the session for the request and cause requests to be completed in sequence. Many of the graphql requests are not able to be cached in Varnish at this time.

We have seen this behavior in the Magento "theme" frontend also, but there are typically only a few requests that typically happen concurrently, and we see these results show up in New Relic on transaction traces a lot for AJAX calls that are likely to be running concurrently with other requests with the same session.

![image](https://user-images.githubusercontent.com/50849/144678278-31683534-439d-405b-9e8a-490c88390242.png)

The problem is not just isolated to AJAX calls like `customer/section/load` on the Magento "theme" frontend, it also tends to occasionally interfere with other requests on product pages, or AJAX calls in the checkout. In the example below the unlucky visitor ended up **waiting 8.8s to get an initial response for loading the page instead of what should have taken 300ms** because they had some other request that was locking their session. It doesn't happen often, but it's not pleasant when it does.

The big behavioral difference is that **most of the requests in the Magento "theme" frontend do NOT happen concurrently**, so session locking on a couple concurrent requests doesn't affect things overall that much, and it tends to be more of an exception and most requests (even customer/section/load AJAX requests) are not delayed waiting for session locks to release on average, so **this known deficiency is only  causing mild problems with the Magento "theme" frontend overall**, and limiting concurrent AJAX calls is a way of working around it.

![image](https://user-images.githubusercontent.com/50849/144678436-2facbb99-d74b-4b6d-a413-5475ced304aa.png)

This **session locking causes lots of problems with a PWA by delaying concurrent AJAX calls** causing them to wait in line and finish sequentially as the locks they are waiting on clear. This ends up making requests randomly appear like they are taking a really long time to complete while under the hood they are primarily just waiting in line for the session to be available so it can lock the session and complete it's request. This makes it very hard to identify from the client side where the problem is as it relates to concurrent requests for the same session and running the requests independently results in a very fast response.

**While it's possible to disable session locking, this can cause serious issues with requests that make changes to session data and overwrite each other**, and that can result in serious problems on checkout where **payments are captured but orders fail to be saved** in Magento resulting sometimes in a customer re-submitting the order and **getting charged multiple times** if they are able to get the order to complete successfully.

The [library used by Magento (v2.3.4)](https://github.com/magento/magento2/blob/2.4.3-p1/composer.lock#L321) is the [latest available (colinmollenhour/php-redis-session-abstract v1.4.4)](https://github.com/colinmollenhour/php-redis-session-abstract/releases) and it **[does support the ability to process read only requests without locking a session](https://github.com/colinmollenhour/php-redis-session-abstract/blob/v1.4.4/src/Cm/RedisSession/Handler.php#L434)** when the global config is set to utilize session locking... it just so happens that [Magento does not utilize this functionality](https://github.com/magento/magento2/blob/2.4.3-p1/lib/internal/Magento/Framework/Session/SaveHandler/Redis.php#L63) and [opens all session connections without specifying](https://github.com/colinmollenhour/php-redis-session-abstract/blob/v1.4.4/src/Cm/RedisSession/Handler.php#L263) if it needs to write to the session or not, thus defaulting to write mode and session locking.

1. https://github.com/magento/magento2/blob/2.4.3-p1/composer.lock#L321
2. https://github.com/colinmollenhour/php-redis-session-abstract/releases
3. https://github.com/colinmollenhour/php-redis-session-abstract/blob/v1.4.4/src/Cm/RedisSession/Handler.php#L434
4. https://github.com/magento/magento2/blob/2.4.3-p1/lib/internal/Magento/Framework/Session/SaveHandler/Redis.php#L63
5. https://github.com/colinmollenhour/php-redis-session-abstract/blob/v1.4.4/src/Cm/RedisSession/Handler.php#L263

### Possible Solution
Requests that come into Magento as **`GET`** requests are typically expected to return generic publicly identical response that can be cached by Varnish, while **`POST`** requests are explicitly not allowed to be cached by Varnish as they are expected to contain visitor/session/customer private data in the responses. Important write operations should typically happen in **`POST`** requests, and those types of requests would be expected to need and utilize session locking, while **`GET`** requests would generally be for returning generic data that is not visitor/session/customer specific and the same response would be returned to all requests and not involve any kind of write to the session. **If any session write activity were to occur on `GET` requests, it would likely be to update the timestamp of the most recent request to indicate the session is still active and has not expired yet** (this is an assumption that should be verified).

---

- [ ] Severity: **S2** _- Affects non-critical data or functionality and forces users to employ a workaround._
m2-assistant[bot] commented 2 years ago

Hi @mttjohnson. Thank you for your report. To speed up processing of this issue, make sure that you provided the following information:

Make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, Add a comment to the issue:

@magento give me 2.4-develop instance - upcoming 2.4.x release

For more details, review the Magento Contributor Assistant documentation.

Add a comment to assign the issue: @magento I am working on this

To learn more about issue processing workflow, refer to the Code Contributions.


:clock10: You can find the schedule on the Magento Community Calendar page.

:telephone_receiver: The triage of issues happens in the queue order. If you want to speed up the delivery of your contribution, join the Community Contributions Triage session to discuss the appropriate ticket.

:movie_camera: You can find the recording of the previous Community Contributions Triage on the Magento Youtube Channel

:pencil2: Feel free to post questions/proposals/feedback related to the Community Contributions Triage process to the corresponding Slack Channel

ihor-sviziev commented 2 years ago

@mttjohnson, I can say that precisely the same issue we have on the luma-based frontend. Actually, because of too frequent requests to customer/section/load, we had even issues with losing customer sessions. The solution for us was the following:

  1. Reduce not needed session starts https://github.com/magento/magento2/compare/2.3...ihor-sviziev:fix-not-needed-session-start
  2. Installed the following extension: https://github.com/integer-net/magento2-session-unblocker/
cpartica commented 2 years ago

I confirmed the issue. I think is valid to address this at least in GraphQL where we don't really need session

github-jira-sync-bot commented 2 years ago

:x: Cannot export the issue. This GitHub issue is already linked to Jira issue(s): https://jira.corp.magento.com/browse/AC-2362

rsidhaarth commented 2 years ago

Guys any solution to this? We are using PWA as well and we get the same issue. Cart returning wrong values due to the session being locked. And when we turned off the session lock, site loaded super fast and all issues are gone.

But wondering what other issues it may open since we disabled the session locking.

colinmollenhour commented 2 years ago

Sessions don't start unless something initializes them so it should be possible to add a breakpoint or debug statement and look at the stack trace to see why they are starting in the first place and fix that. There are likely multiple places.

ihor-sviziev commented 2 years ago

@colinmollenhour, unfortunately, in Magento 2 it's not true. They're getting initialized automatically because of dependencies to SessionManager since they're started in the SessionManager constructor. https://github.com/magento/magento2/blob/38e2ae57fab1fb964d5970c83a30b71c19d20116/lib/internal/Magento/Framework/Session/SessionManager.php#L99-L124

Some time ago I tried to fix that in https://github.com/magento/magento2/pull/31350, but a lot of tests are getting broken :(

colinmollenhour commented 2 years ago

The session supports disabling locking in the config so if you can figure out how to update the config in this context that would work: https://github.com/colinmollenhour/php-redis-session-abstract/blob/b70508a9b2183d4fc13871cf9138a52fbef776f3/src/Cm/RedisSession/Handler/ConfigInterface.php#L131

Nuranto commented 2 years ago

@colinmollenhour Why the lock sleep time is at least 0.5s ? Can't we sleep less ?

    /**
     * Sleep 0.5 seconds between lock attempts (1,000,000 == 1 second)
     */
    const SLEEP_TIME         = 500000;

0.5s seems a huge amount of time

colinmollenhour commented 2 years ago

@Nuranto Good question, it wasn't a highly researched conclusion.. The general idea is that such locks should be rare, basically only occurring if the user has some sort of long-running process that is locking the session or the server is being completely overwhelmed by traffic in which case checking the lock too frequently can just make the issue worse.

If you have a situation that every page load causes locking delays such as this issue with concurrent GraphQL requests then it was not intended for this purpose and probably locking should be disabled entirely for these requests (since they don't modify the session so there is no point in locking) or ideally the session should not even be started at all since using sessions defeats the purpose of a stateless API design. Even if the lock sleep time is just 10ms, you're still causing the concurrent requests to be serialized so the delays to total page load time will be a lot more than 10ms.

rsidhaarth commented 2 years ago

I am facing this issue in ScandiPWA as well. If the locking is disabled, then if you open multiple new tabs. And then, if you add the product to the cart, it gets destroyed when there are pending page loads.

Nuranto commented 2 years ago

@colinmollenhour Maybe make this delay configurable via options. And also, a coworker suggest to randomize a bit delays so we don't have many processes sleeping at the same time, and then retry to lock at the same time too, leading to more locks. But I agree with you, the real issue is on Magento's side. Session management was always a nightmare, even on Magento 1. (It is not specific to GraphQL, we don't use GaphQL, and still have the issue.).

And @rsidhaarth is right, disabling the locking can cause bugs... However, I guess we don't have the choice to do that as long as this is not fixed in magento.

mttjohnson commented 2 years ago

I was thinking it would be worth exploring if it was possible to determine the HTTP request type before setting up the session handler so that it would be possible to pass in a third parameter to \Cm\RedisSession\Handler() call to indicate that it should start a read only session and not require a lock. I'm expecting most of the important calls where you'd want the session to be locked are going to be POST requests, and it could have been as simple as just changing the third parameter to Handler() to set the session to read only on all GET requests.

    /**
     * Get connection
     *
     * @return \Cm\RedisSession\Handler
     * @throws SessionException
     */
    private function getConnection()
    {
        $pid = getmypid();
        if (!isset($this->connection[$pid])) {

            # Determine HTTP request type
            $requestType = mysteriousGetHttpRequestTypeMethod(); # returns GET/POST/OPTIONS etc...
            # Parameter value to send to session handler indicating 
            # if locking the session is needed for writing to the session
            $readOnlySession = $requestType === 'GET' ? true : false;

            try {
                $this->connection[$pid] = new \Cm\RedisSession\Handler($this->config, $this->logger, $readOnlySession);
            } catch (ConnectionFailedException $e) {
                throw new SessionException(new Phrase($e->getMessage()));
            }
        }
        return $this->connection[$pid];
    }

Setting the session to read only would still allow the session to be read, but only require locking it when important POST requests come through that needs to write to the session, like during checkout where writing the order number to the session is a critical part of the checkout process during payment capture... which is when the session locking is really needed.

If a GET request needs to write to the session for something as simple as updating a timestamp in the session, it looks like this wouldn't work as the _readOnly property of the session handler is checked on a write and would just return and ignore the request to write to the session.

  1. https://github.com/magento/magento2/blob/2.4.3-release/lib/internal/Magento/Framework/Session/SaveHandler/Redis.php#L63
  2. https://github.com/colinmollenhour/php-redis-session-abstract/blob/v1.4.4/src/Cm/RedisSession/Handler.php#L263
  3. https://github.com/colinmollenhour/php-redis-session-abstract/blob/v1.4.4/src/Cm/RedisSession/Handler.php#L630

I feel like if we dynamically disabled locking on different requests, that could still run into problems as there would be the potential of some random request coming through that could overwrite the value written from a request where session locking was used. Scenario being say something stupid like a GET request that takes for whatever reason 8 seconds to complete (but that never happens right...) and it writes to the session at the end of it's request based on some value it modified from reading the session at the beginning of it's request. If some other request (like and important POST) that enabled locking comes through and happens to send a write after the GET request read it's initial value from the session, the POST request sets up locking on the redis session, and then writes back to the session before the GET request writes, it's value could get overwritten by the long running GET request.

I think we still need session locking on all writes to the session to ensure safety, but there should be some way to reduce the frequency of the writes to the session, as this seems unnecessary on most of the requests hitting the backend and at the very least unnecessary for being written as frequently as they are.

colinmollenhour commented 2 years ago

@mttjohnson I'm open to a PR for supporting a no-locking flag in the handler.

As a dirty hack you might be able to call session_write_close() somewhere after the session is instantiated (or force it to happen) but early in the request so the lock is released sooner.

I would still encourage the M2 community to focus on avoiding instantiating a session at all for requests that don't actually use it, though. It seems it should be up to the controller to determine if a session is instantiated and it should never be instantiated in an API request, period.

Nuranto commented 2 years ago

@mttjohnson couldn't that be done on Magento's side, by disabling the disable_locking parameter when POST ?

mttjohnson commented 2 years ago

@Nuranto In thinking through this, I don't think we'd want to disable locking on any session writes because that puts the risk of whatever has a write with locking disabled of being able to overwrite session data for a request where locking is enabled and locking enabled requests expecting other write requests to be utilizing locking. The session locking is going to be important on all writes in order to guarantee nothing important gets overwritten... so the best approach would be to eliminate the need for a write (like in the case of an API call as @colinmollenhour has pointed out), and reduce the frequency that a write is performed while still allowing for read only operations.

I haven't looked into things to see where in the request lifecycle the session is initially read, and where it's possibly written to, but if it was possible to separate those two operations, I'm wondering if the read operation could be done with a connection established for read-only requests so that the reads are not blocked by any other concurrent write operations and then immediately close the connection... and then if the request later needs to write to the session open a separate connection for writing with locking, re-read the session to ensure nothing has changed, and now that it has a lock it will stay consistent until it has a chance to write and change the data, then write the changes. I don't know how much overhead that would introduce by opening and closing the read-only connection quickly, and then opening a new connection for writing that would incur a read/write operations... as that introduces an additional read operation and connection setup/teardown into the mix, but that would be a possible way of handling it safely and minimally impact session writes so they aren't held open very long.

If the session writes are managed separately and something simple is being updated in the session like just a last active timestamp in order to determine when the session should be expired (I haven't actually looked to see specifically what is getting written on every request), I wonder if after reading the session data initially, it could be determined if it should update the session or wait longer before updating the session. It would be possible to set a threshold of say 1-5 minutes so that a session write isn't occurring but once every 1-5 minutes in order to update the last active timestamp or something. If sessions are not expiring for 60 minutes, several hours, or days, then checking it to see if it needs to write, and then not writing but once every 5 minutes would significantly reduce the frequency of session writes for any call where sessions are currently getting written to on every request.

damienwebdev commented 2 years ago

I don't know if it's been mentioned yet, but there's work in v2.4.5 to disable sessions on GraphQl as they're not strictly necessary any more.

@cpartica or @sidolov may know best, I haven't messed around with the setting yet.

luishenriqueamaral commented 2 years ago

We are also facing the same problem using our headless storefront and Magento checkout pages. Although we don't use graphql, our requests towards default and custom rest endpoints are stuck in Redis operations. We contacted Adobe support and they recommended us to disable the session locking and they also mentioned that this is default in the cloud version.

I noticed that @colinmollenhour mentioned a few times (1, 2, 3) that the risk of this is low and that it could only cause minor side effects, which is encouraging. But we are still hesitant to do so as this comment from @mttjohnson poses a greater risk:

While it's possible to disable session locking, this can cause serious issues with requests that make changes to session data and overwrite each other, and that can result in serious problems on checkout where payments are captured but orders fail to be saved in Magento resulting sometimes in a customer re-submitting the order and getting charged multiple times if they are able to get the order to complete successfully.

Matt, is this something that actually happened to you or is it just a hypothetical situation?

mttjohnson commented 2 years ago

@damienwebdev I was reading the release notes on 2.4.5 last week and noticed that it mentioned there was some ability to disable sessions on GraphQL requests, and I think that would solve the big issues for operating Magento with a PWA that makes a lot of concurrent GraphQL requests. While writing to and locking sessions on all other requests hitting the backend is still excessive, it would be nice to confirm if this provides a workaround to this issues I identified above, so we could resolve the critical aspects of this issue.

@luishenriqueamaral For the sites I managed we always enforced having session locking enabled, but I have heard reports from other people claiming the issues during checkout when session locking was disabled. I have however witnessed and helped troubleshoot issues where session locking was enabled but long running concurrent requests during checkout placement would cause the session lock to expire and session data similarly getting overwritten, causing payments to be captured without associated orders in the database. Sometimes we'd see this as payments without orders, and other times it would show up as multiple payments from the same initial quote. It's also worth pointing out that I've seen payments without order, and duplicate payments that didn't relate to any session problems, so that symptom can have multiple causes, it is just that session can be related to causing problems. I suspect that most of the session data getting overwritten isn't of much concern, but during checkout if critical data is written to the session, the act of it getting overwritten can result in some seriously difficult to troubleshoot issues in addition to customer service problems.

Yonn-Trimoreau commented 2 years ago

I've come up with an acceptable temporary solution for this: https://github.com/colinmollenhour/php-redis-session-abstract/issues/50

To temporarily fix this issue, here are the patches to apply:

On colinmollenhour/php-redis-session-abstract module => implement-write-lock-and-diff-patch.txt

And on magento/framework module => set-php-serialize-handler-patch.txt

In fact the colinmollenhour/php-redis-session-abstract module implements read locking and NOT write locking. Basically this means that if multiple requests needing to read the session come to happen concurrently, the second needs to wait for the first to end, the third needs to wait for the second, and so on... This situation is natively happening frequently when using GraphQL, or in the native checkout, and another example is when enabling the banner functionality in Magento Commerce (the /customer/section/load request happens almost always concurrently with the /banner/ajax/load request).

A handy way of clearly and consistently reproducing this performance issue is by following these steps:

The disable_locking feature is clearly not a definitive solution and may result in weird and random problems depending on the performance of the browser and server. What may happen is the following situation:

           A-------------A
           |             |
         reads        writes I
           |             |
           v             v
Session-------------------------X------Time
                ^               ^
                |               |
              reads          writes J
                |               |
                B---------------B

In this situation, the request B will overwrite the modifications made by the request A, and the resulting session will not contain I modification, but only J. Depending on the modifications, this can provoke this behavior https://github.com/magento/magento2/issues/10111 or this one https://magento.stackexchange.com/questions/350520/checkout-successful-redirects-to-empty-cart-but-not-using-redis The reverse situation may also happen (some request removes an entry and the following request unremoves it), and it can cause notifications to show twice for example.

But the scenario I've just described can also happen when using file storage instead of Redis. If using file storage on your project doesn't produce such problems, then you can safely use disable_locking feature. The root cause of the overwriting problem is that sessions are read and written as a whole and not "column by column", and there is no diff mechanism or database-oriented language that can tell "write this column specifically but not this other one".

I understand the point of @colinmollenhour which says that API requests should not rely on instantiating the session since the common and widely adopted architectural style for building APIs is REST, and being RESTful implies being stateless. But currently, most of Magento's checkout APIs and customer/section/load or even banner/ajax/load APIs are not RESTful (and a whole bunch of Magento marketplace's modules include APIs that are not either). Probably for legacy reasons or wrong conception, but in any case, rewriting all this is quite a lot of work, and in the meantime, the problem will continue to happen. Having stateful AJAX calls in non-headless websites is also quite common and I can't find why we should put an emergency on forcing them to comply to the REST principles. By the way, I really don't understand why Magento GraphQL requests are stateful, this sounds like a serious architectural mistake to me (but I don't use it regularly so I won't firmly pronounce on this).

And the solution proposed by @mttjohnson brings the burden of having to list every requests that need read-only access to the session in some kind of whitelist (another Magento xml file?). Ultimately, this kind of whitelist would not be needed if @colinmollenhour's module was implementing write lock instead of read lock (which the module is actually advertised to).

Maybe the definitive solution is for Magento to implement a "last second session state check" with a diff mechanism that would only propagate actual changes made to the session to the write operation, and apply it on all types of session storage (file, memcache or Redis). And for @colinmollenhour to implement a simple real write lock solution (probably by passing an option to avoid causing regressions on every non-Magento users of this module).

Both solutions are actually implemented in the patches I supplied at the beginning of this comment. It's now just a matter of moving some part in Magento and another in @colinmollenhour's module, if both parties accept this way of resolving this extremely long running issue once and for all.

colinmollenhour commented 2 years ago

@Yonn-Trimoreau I suppose you're right if you are referencing "Only one process may get a write lock on a session." - it should just read "Only one process may get a lock on a session." but that is just semantics..

As of now when you start a session is is basically assumed that there will be a write, like a "SELECT ... FOR UPDATE". Unlike a database, you can't know if there will be a write until the very end, basically during the "COMMIT". So I don't see how a concept of separate read/write locks translates, but I digress..

For example, let's say you add a product to the cart and so a "Added to cart" success message is added to the session storage. Then two requests are made at once and they both start the session reading the data without waiting. When they write the session they will both delete the message so that's fine, but still you have a different outcome than if you had a read lock (two requests get the success message instead of two). Another example may be a "firstPageAfterLogin" flag or a security nonce that is meant to be read only once. The problem is not serializing the writes in these cases, it is the reads.

I'm not familiar with the Magento 2 frontend so I don't know what requests are occurring and why they need to use the session but my general suggestion would be to find which requests modify the session and which ones don't. The ones that don't should just be read-only and the ones that do should just use the normal locking. If you're in a hurry this could even be done outside of Magento 2 code:

In index.php or anywhere before the session is instantiated:

if (preg_match('#^/(banner|customer)/#', $_SERVER['REQUEST_URI']??'')) {
    define('SESSION_READ_ONLY', TRUE);
}

Then replace https://github.com/magento/magento2/blob/cff8851bad72b5902c99c7d210dca6355940d5e8/lib/internal/Magento/Framework/Session/SaveHandler/Redis.php#L63 with:

$this->connection[$pid] = new \Cm\RedisSession\Handler($this->config, $this->logger, defined('SESSION_READ_ONLY'));

The read-only flag is already supported in https://github.com/colinmollenhour/php-redis-session-abstract/blob/master/src/Cm/RedisSession/Handler.php#L263

EDIT: After reading up farther I see/remember this was basically already suggested by @mttjohnson a while back. Using read-only sessions is safer than disabling locking since the read-only sessions can't clobber good data.

Yonn-Trimoreau commented 2 years ago

@colinmollenhour Yes I am referencing this sentence "Only one process may get a write lock on a session". And semantics is actually the key to the problem here.

A read lock is what LibreOffice or Microsoft Word implements for example. When you open it, a lock file is written next to the file you're editing, just to remind you that you've already opened it in another text editor instance, in case you try to double open it. So that you won't overwrite your data from one instance to another.

A write lock is what the filesystem implements, or any NoSQL database, they run write operations one at a time to preserve atomicity of the data which is currently being written. And Redis does actually implement this by default.

What you're talking about when mentioning the "COMMIT" operation in SQL language is a transaction. Basically, the SQL database will put some part of the tables you're reading/writing in memory, to let you work with a consistent version of the database, and then when you call COMMIT, it will take the write lock (to avoid anyone else writing during the commit operation), and then write all the INSERT or UPDATE operations you put in the transaction one at a time, until it releases the lock and frees the temporary memory. But in the meantime, callers can continue reading the tables you're currently modifying in the transaction (and that's the important part in our case). Transactions are useful for two purposes:

  1. To ensure consistency of the whole database. Storing sessions doesn't critically need consistency since we don't have foreign keys, auto increment, etc... And Redis doesn't provide consistency at all.

  2. To ensure the atomicity of multiple write operations, so that a group of operations are either persisted entirely or not at all. Internally this works by, in this order: taking the write lock ; taking a snapshot of the current state of the data ; applying the multiple operations to the snapshot ; overwriting the actual data in the database with the prepared snapshot ; releasing the write lock And this is exactly what we need in our session use case.

What's a real shame is that Redis actually provide such transactions. But they operate at the entry level, and since the whole session is stored serialized in a single entry, we cannot profit from this.

As you can see, the following workflow is needed:

  1. Read session data when request starts
  2. Keep a snapshot of the first read data in memory
  3. Process the request
  4. Calculate a diff between the first read data and what is programmed to be written
  5. Take the write lock
  6. Read the session data again
  7. Apply the calculated diff on this fresh session data
  8. Write the session
  9. Release the lock

So if Magento takes care of the diff part of the process, your module should expose three methods: read, lock and write, so that the caller can choose when to read, when to lock, and when to write.

And yes, this will change the outcome, 100% agree on that. But since most Magento projects are presently forced to disable_locking for performance reasons, and that works quite ok 95% of the time, we have already proven that Magento can safely work without a read lock. What I would seriously like to remove is the "quite" part of this sentence, by ensuring that there is no random or weird and un-debuggable behavior in all this.

Using read-only is half of the answer, since if you have multiple requests that need to write in parallel (and this actually happens in the Magento checkout), you will still have unnecessary waiting time.

convenient commented 1 year ago

@colinmollenhour We did as you suggested in https://github.com/magento/magento2/issues/34758#issuecomment-1312635955 for requests that we're 100% confident we do not care about the session for.

  1. We put some REQUEST_URI checking and handling in the top of pub/index.php to identify if this is a request we don't care about the session for.
  2. We patched in the change so that the handler is passed defined('SESSION_READ_ONLY')

It's not a perfect metric, but you can see where it was deployed and any session variation has dropped off

Screenshot 2022-12-08 at 10 04 43

And you can see average response time has dropped in this table, as well as the "max" time dropping way down because we're no longer fighting with session locking etc i think.

Before Screenshot 2022-12-08 at 10 08 04

After Screenshot 2022-12-08 at 10 08 38

Cheers for the simple recommendation!

Yonn-Trimoreau commented 1 year ago

Yes definitely a simpler and faster solution, if you don't care about locking at all

convenient commented 1 year ago

@Yonn-Trimoreau definitely a stop gap until we can get magento up to date to latest, this helps with some locking performance before the holiday period at least :)

Swahjak commented 1 year ago

@Yonn-Trimoreau did you make any progress on this after the setReadOnly method was introduced (by you) on colinmollenhour/php-redis-session-abstract?

Yonn-Trimoreau commented 1 year ago

@Swahjak sorry no, the patch is working on my customers' ends, and to be frank with you, I know making this MR will take some time, and has probably a 30% chance of being approved and integrated in the core.

Lot of efforts for a small chance of success, and since most of my customers are Enterprise Edition customers (that's why I massively encounter the /banner/ajax/load issue...), I would have hoped that a "Severity S0" issue would be taken more seriously by Adobe's team and that they would take up on the production of a definitive fix, based on my last comprehensive explanation + working patch.

Swahjak commented 1 year ago

@Yonn-Trimoreau I feel you. Thank you for taking the time to respond, it's appreciated.

convenient commented 1 year ago

Seems 2.4.5 and above have this flag which may help provided all the graphql endpoints are coded correctly.

https://github.com/magento/magento2/blob/3854187fd19fd3189781d4889d1f88f9ee99e5e5/app/code/Magento/GraphQl/Model/Config/DisableSession.php#L17

https://github.com/magento/magento2/blob/8af9d3305665033f4162bad7085c18c6b1e79c84/app/code/Magento/GraphQl/etc/graphql/di.xml#L42

indykoning commented 6 months ago

Monitoring we have suggests this same issue occurs within the admin wysiwyg editor in Magento 2.4.7.

When the editor is loaded it will attempt to load the thumbnails with https://example.com/admin/cms/wysiwyg_images/thumbnail/file/XXXXXXXX/key/XXXXXXXX/ Considering all these thumbnails are requested at the same time and they all implement read locking.

For now i will implement @colinmollenhour's suggestion https://github.com/magento/magento2/issues/34758#issuecomment-1312635955

denniskopitz commented 5 months ago

Hello,

We have implemented @colinmollenhour's suggestion https://github.com/magento/magento2/issues/34758#issuecomment-1312635955 and are testing in pre-production environments. We are primarily concerned about improving performance (ie eliminating unnecessary redis locking) around the checkout flow.

Wondering if other's could share the request URIs they are targeting. So far we have identified these:

'/(amasty_promo|estimate-shipping-methods|amazon_pay|group-prices|customer\/section\/load)/'

Logging the requests in this flow looks like this (SESSION READ indicates where we have modified the redis connection to readOnly):

GUEST CHECKOUT

[2024-06-11T17:48:31.373743+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /mens/watches.html [] []
[2024-06-11T17:48:38.352651+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /customer/section/load/?sections=messages&force_new_section_timestamp=false&_=1718128116420 [] []
[2024-06-11T17:49:19.959767+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /checkout/cart/add/uenc/aHR0cHM6Ly9sb2NhbC5zaGlub2xhLmNvbS9tZW5zL3dhdGNoZXMuaHRtbA%2C%2C/product/1297/ [] []
[2024-06-11T17:49:23.301546+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /customer/section/load/?sections=cart%2Cdirectory-data%2Cammessages%2Csignifyd-fingerprint%2Cmessages%2Clocation&force_new_section_timestamp=true&_=1718128116421 [] []
[2024-06-11T17:49:32.360905+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /mens/watches/therunwell41-leather-watch-s0126.html [] []
[2024-06-11T17:49:42.726151+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /checkout/cart/add/uenc/aHR0cHM6Ly9sb2NhbC5zaGlub2xhLmNvbS9tZW5zL3dhdGNoZXMvdGhlcnVud2VsbDQxLWxlYXRoZXItd2F0Y2gtczAxMjYuaHRtbA%2C%2C/product/1293/ [] []
[2024-06-11T17:49:45.351061+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /customer/section/load/?sections=cart%2Cdirectory-data%2Cammessages%2Csignifyd-fingerprint%2Cmessages%2Clocation&force_new_section_timestamp=true&_=1718128176496 [] []
[2024-06-11T17:49:58.398474+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /checkout/cart/ [] []
[2024-06-11T17:50:05.137050+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /amazon_pay/checkout/config?_=1718128203130 [] []
[2024-06-11T17:50:05.137051+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /reclaim/checkout/reload [] []
[2024-06-11T17:50:05.503370+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/estimate-shipping-methods [] []
[2024-06-11T17:50:07.096253+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /customer/section/load/?sections=cart%2Cmessages%2Clocation&force_new_section_timestamp=true&_=1718128203131 [] []
[2024-06-11T17:50:07.513691+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/totals-information [] []
[2024-06-11T17:50:07.517021+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/totals-information [] []
[2024-06-11T17:50:07.568005+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/totals-information [] []
[2024-06-11T17:50:11.820151+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /amasty_promo/popup/reload/?uenc=aHR0cHM6Ly9sb2NhbC5zaGlub2xhLmNvbS9jaGVja291dC9jYXJ0Lw%2C%2C&_=1718128203132 [] []
[2024-06-11T17:50:12.706038+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /checkout/ [] []
[2024-06-11T17:50:17.680662+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /customer/section/load/?sections=idme-verify&force_new_section_timestamp=false&_=1718128215558 [] []
[2024-06-11T17:50:17.922531+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/internal/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/retrieve-adyen-payment-methods [] []
[2024-06-11T17:50:17.979711+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/estimate-shipping-methods [] []
[2024-06-11T17:50:19.993062+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /customer/section/load/?sections=messages%2Clocation&force_new_section_timestamp=true&_=1718128215559 [] []
[2024-06-11T17:50:20.151170+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/totals-information [] []
[2024-06-11T17:50:20.151197+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/totals-information [] []
[2024-06-11T17:50:22.767240+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /amasty_promo/popup/reload/?uenc=aHR0cHM6Ly9sb2NhbC5zaGlub2xhLmNvbS9jaGVja291dC8%2C&_=1718128215560 [] []
[2024-06-11T17:50:24.932934+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /reclaim/checkout/email [] []
[2024-06-11T17:50:26.557505+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /customer/section/load/?sections=messages%2Clocation&force_new_section_timestamp=true&_=1718128215561 [] []
[2024-06-11T17:50:26.723169+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/customers/isEmailAvailable [] []
[2024-06-11T17:50:35.854990+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/estimate-shipping-methods [] []
[2024-06-11T17:50:35.854846+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/totals-information [] []
[2024-06-11T17:50:38.882791+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /amasty_promo/popup/reload/?uenc=aHR0cHM6Ly9sb2NhbC5zaGlub2xhLmNvbS9jaGVja291dC8%2C&_=1718128215562 [] []
[2024-06-11T17:50:39.081188+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/totals-information [] []
[2024-06-11T17:50:42.093900+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/totals-information [] []
[2024-06-11T17:50:42.093969+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/totals-information [] []
[2024-06-11T17:50:44.664860+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/totals-information [] []
[2024-06-11T17:50:44.664870+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/internal/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/retrieve-adyen-payment-methods [] []
[2024-06-11T17:50:44.664867+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/totals-information [] []
[2024-06-11T17:50:44.665751+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/shipping-information [] []
[2024-06-11T17:50:48.366334+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /amasty_promo/popup/reload/?uenc=aHR0cHM6Ly9sb2NhbC5zaGlub2xhLmNvbS9jaGVja291dC8%2C&_=1718128215563 [] []
[2024-06-11T17:50:50.308453+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /customer/section/load/?sections=messages%2Clocation&force_new_section_timestamp=true&_=1718128215564 [] []
[2024-06-11T17:50:53.869133+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /amazon_pay/checkout/config?_=1718128215566 [] []
[2024-06-11T17:50:53.869097+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /customer/section/load/?sections=messages%2Clocation&force_new_section_timestamp=true&_=1718128215565 [] []
[2024-06-11T17:50:54.177965+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/affirm/checkout/verify [] []
[2024-06-11T17:50:55.737571+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /customer/section/load/?sections=messages%2Clocation&force_new_section_timestamp=true&_=1718128215567 [] []
[2024-06-11T17:51:06.361389+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/set-payment-information [] []
[2024-06-11T17:51:06.374592+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/totals-information [] []
[2024-06-11T17:51:08.573159+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /captcha/refresh/ [] []
[2024-06-11T17:51:08.573166+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /customer/section/load/?sections=messages%2Clocation&force_new_section_timestamp=true&_=1718128215568 [] []
[2024-06-11T17:51:08.886128+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/totals?_=1718128215569 [] []
[2024-06-11T17:51:08.915024+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /amasty_promo/popup/reload/?uenc=aHR0cHM6Ly9sb2NhbC5zaGlub2xhLmNvbS9jaGVja291dC8%2C&_=1718128215570 [] []
[2024-06-11T17:51:13.269034+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/payment-information [] []
[2024-06-11T17:51:18.561105+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /captcha/refresh/ [] []
[2024-06-11T17:51:18.561144+00:00] report.INFO: group:NONE: SESSION READ  Request URI: /customer/section/load/?sections=cart%2Ccaptcha%2Cammessages%2Cmessages%2Clocation&force_new_section_timestamp=true&_=1718128215571 [] []
[2024-06-11T17:51:18.561269+00:00] report.INFO: group:NONE: SESSION WRITE Request URI: /checkout/onepage/success/ [] []
colinmollenhour commented 3 months ago

@denniskopitz Looks like a great step forward, thanks for sharing. Why does /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/totals-information need to lock the session? Surely REST API endpoints don't write messages to the session and just return them in the response body instead? Asking because I genuinely don't know..

indykoning commented 2 months ago

Wondering if other's could share the request URIs they are targeting. So far we have identified these:

'/(amasty_promo|estimate-shipping-methods|amazon_pay|group-prices|customer\/section\/load)/'

I've mainly been focussing on the backend since that caused the most problems for our clients for that i had: #^/[^/]+/(catalog/category/categoriesJson|cms/wysiwyg_images/thumbnail|admin/widget_instance|\.thumbs[a-zA-Z]+)/#

Surely REST API endpoints don't write messages to the session and just return them in the response body instead? Asking because I genuinely don't know..

I don't think it was Magento framework code but a plugin, but i can definitely remember finding some messages being set on the session in some rest endpoints (after wondering why the hell our headless frontend wasn't showing these messages)

denniskopitz commented 2 months ago

@denniskopitz Looks like a great step forward, thanks for sharing. Why does /rest/shinola_us/V1/guest-carts/rF2YIRPXMEVwGI0gdhKVRqFzSvILoWA3/totals-information need to lock the session? Surely REST API endpoints don't write messages to the session and just return them in the response body instead? Asking because I genuinely don't know..

Apologies for the delayed response @colinmollenhour. When testing, we found that setting the totals-information to the readOnly list caused the checkout success page to fail. Rather than spend the time to figure out root cause of that, we just left it out.

denniskopitz commented 2 months ago

@indykoning Did your effort improve performance on the admin wysiwyg editor? We have also experienced that issue.

vadim4err commented 2 months ago

We have the same issue with Cm\RedisSession\Handler::read 30sec API request to /V1/carts/mine/totals-information I've found that the customer's quote has product qty 871 with 251 items. But, unfortunately, I can't reproduce the issue.

Here is a Redis session config, I've also set debug log level to catch the issue, maybe this can shed light

        'timeout' => '2.5',
        'persistent_identifier' => '',
        'compression_threshold' => '2048',
        'compression_library' => 'gzip',
        'log_level' => '7',
        'max_concurrency' => '25',
        'break_after_frontend' => '5',
        'break_after_adminhtml' => '30',
        'first_lifetime' => '600',
        'bot_first_lifetime' => '60',
        'bot_lifetime' => '7200',
        'disable_locking' => '0',
        'min_lifetime' => '60',
        'max_lifetime' => '2592000'
colinmollenhour commented 2 months ago

I'm not familiar with Magento 2 checkout - does it send one request per quote item or something like that? Also if any other requests had a fatal error and the session lock was not released that could explain it. E.g. perhaps there is an out of memory error on the request just before the totals-information one.

AndresInSpace commented 2 months ago

I'm not familiar with Magento 2 checkout - does it send one request per quote item or something like that? Also if any other requests had a fatal error and the session lock was not released that could explain it. E.g. perhaps there is an out of memory error on the request just before the totals-information one.

The vanilla m2 one step checkout sends everything via ajax to update quote as you enter information, then shipping options, then payment accepted redirects to success page.

We utilize Bolt payment which has to communicate to Bolt servers then updates quote back in Magento via API which is why I'm reading this issue.

@colinmollenhour can you clarify, if there is a fatal error anywhere in Magento 2 process then it does not release the lock for session?

Yonn-Trimoreau commented 2 months ago

@AndresInSpace there is a "catch all" exception management in place. If any exception occurs, the session is closed cleanly. So the lock is released, don't worry about it.

@colinmollenhour some situations can lead to having at least 3 Ajax requests sent in parallel in the normal checkout (not one per item, but each request can be time-consuming, especially if the cart contains a lot of items)

And we've already discussed this, the 3 definitive solutions are:

@AndresInSpace you should look into using the patches I have provided above in this discussion. They are presently in place on 5 sites and they fix exactly the kind of issue you are describing. They are however not adapted to the latest version of colinmollenhour's library so I can send you an updated version if you want (which a colleague of mine has made compatible with the latest version).

@colinmollenhour just to clarify things up: I understand why it would be problematic to do this in your library, as this would be a major breaking change for all your non-Magento users, and I don't think you should change your mind because of Magento's current issue. But still, this would be the cleanest and most elegant solution.

AndresInSpace commented 2 months ago

Edit: Please ignore my misunderstanding here :)

What I was trying to say (and figure out) is that it seems this is not specific to just using redis for sessions caching(eg magento session), but also when utilized for default caches or full page cache, as this is an issue with redis and the process that opened the request (eg the 'redis session') right? I realize the module is specifically the php redis session, but I am wondering if the other redis module/configurations exacerbate this issue since despite having multiple instances for the default/fpc cache configuration those caches are always being hammered. Would that have a byproduct effect on the redis session cache(due to the single process nature of redis and reaching memory limits in environment and having 50 concurrent users sending requests.., etc) and possibly cause forced breaks?..

What I would like to figure out is if @mttjohnson or anyone in the thread (who all have done amazing debugging work, apologies for all my edits here and the confusion) can confirm they ran their tests on a storefront with redis configured for all (frontend default caches, fpc, and mage sessions) or was the storefront only having mage session configured to redis? Would be interested to see if redis for all three affects the queue timing differently, vs say redis session and varnish for fpc.

Yonn-Trimoreau commented 2 months ago

@AndresInSpace I think you are going out of road with this reasoning. We are talking about Redis session locks. Session locks are specific to sessions and are not used in fpc or other cache types. Furthermore, you are supposed to use a different Redis database for each cache type. No locking conflict can happen. And Redis is a database that is built to withstand a huge number of requests in parallel, if you were to have a response time in the order of seconds, your Redis instance would be already completely down I think? Maybe I'm quite wrong on this, but the performance of Redis is not to be discussed and it does not respond slowly because of being hammered down in a Magento context.

AndresInSpace commented 2 months ago

My apologies everyone I had a misunderstanding, did not mean to branch the subject. Thank you for the clarification @Yonn-Trimoreau. I will look into the suggested patches/fixes for our project with this issue.

@vadim4err IIRC the totals-information triggers a calculate method to collect totals and is expensive since that is where tax loops through all items to calculate accordingly based on row eg line item (I believe this was context @colinmollenhour was asking earlier) https://github.com/magento/magento2/blob/2.4-develop/app/code/Magento/Checkout/Model/TotalsInformationManagement.php

indykoning commented 2 months ago

Surely REST API endpoints don't write messages to the session and just return them in the response body instead? Asking because I genuinely don't know..

I don't think it was Magento framework code but a plugin, but i can definitely remember finding some messages being set on the session in some rest endpoints (after wondering why the hell our headless frontend wasn't showing these messages)

Coming back to this. I've ran into the issue and it it actually Magento code!

The case was, being spammed by the same notifications. Reason why? i had made /customer/section/load use Read Only sessions

And the following endpoint reads notifications from your session, Removes them from your session, and returns the messages. /customer/section/load/?sections=messages

If it cannot remove them from your session, it will simply repeat the cycle.

badger-adam commented 2 months ago

And the following endpoint reads notifications from your session, Removes them from your session, and returns the messages. /customer/section/load/?sections=messages

If it cannot remove them from your session, it will simply repeat the cycle.

This is correct. I can confirm the same issue.

webtekindo commented 2 weeks ago

@Yonn-Trimoreau thank you for the patch, we are going to try it in production.

We are also currently using: https://github.com/integer-net/magento2-session-unblocker/

I'm curious on your opinion, do you think there's any extra benefit on top of your patch to use that module?

Yonn-Trimoreau commented 2 weeks ago

@webtekindo this module seems interesting. But I'm pretty sure it will have no effect in the checkout (since it does not list the checkout session in the di.xml), and I'm wondering if it couldn't cause the same race condition issues as the disable_locking option, depending on the logic involved (since it closes the session once it is read, and not once it is written). My patch and this module won't be technically incompatible, but I think this module is not safe to use in some scenarios (which could be identified by adding some logs and testing thoroughly)

Yonn-Trimoreau commented 2 weeks ago

Here are the updated versions of the patches if you need them:

implement-write-lock-and-diff.txt set-php-serialize-handler.txt

Don't forget to flush the redis session storage after applying it in production.

For reference, the initial comment: https://github.com/magento/magento2/issues/34758#issuecomment-1312524791

webtekindo commented 2 weeks ago

@Yonn-Trimoreau ok perfect we will try without the module and thanks for the updated patches.

webtekindo commented 1 week ago

@Yonn-Trimoreau we are already running the patch in production and I have to say it works perfectly, nice job!

We use the old patch that you shared here https://github.com/magento/magento2/issues/34758#issuecomment-1312524791 and we had issue on the checkout because of the function _arrayRemoveRecursiv after replacing the function with the one in the new patch the problem was solved.

We cannot directly apply the latest version of your patch directly, because probably we still use an older Magento version (2.4.2), is there any other part of the new patch (except _arrayRemoveRecursiv) that is important to adjust?

And do you think it should be possible to make a Magernto module out of it instead of patching Cm/RedisSession, I give it a try but still have some issues (with session locking) here the Magento preference that we try:

Redis.txt

I will be curious if you think it may works that way (notice the read and write functions) ?

public function read($sessionId, bool $readOnly = true)
{
    try {
        $connection = $this->getConnection();
        $connection->setReadOnly($readOnly);
        $sessionData = $connection->read($sessionId);
        $this->_lastReadSessionDataById[$sessionId] = $sessionData;

        return $sessionData;

    } catch (ConcurrentConnectionsExceededException $e) {
        require $this->filesystem->getDirectoryRead(DirectoryList::PUB)->getAbsolutePath('errors/503.php');
    }
}

public function write($sessionId, $sessionData)
{
    $freshSession = $this->_getRefreshedSessionDataToWrite($sessionId);
    return $this->getConnection()->write($sessionId, $freshSession);
}

protected function _getRefreshedSessionDataToWrite(string $sessionId): string
{
    $currentSessionData = $this->read($sessionId, false);

    //$this->getConnection()->setReadOnly(false);
    //$currentSessionData = $this->getConnection()->read($sessionId);

    $lastReadSessionData = $this->_lastReadSessionDataById['sess_' . $sessionId];

    $freshSession = unserialize($currentSessionData) ?: [];
    $lastReadSession = unserialize($lastReadSessionData) ?: [];

    $diffToUnset = $this->_arrayDiffRecursive($lastReadSession, $_SESSION);
    if (!empty($diffToUnset)) {
        $freshSession = $this->_arrayRemoveRecursive($freshSession, $diffToUnset);
    }

    $diffToSet = $this->_arrayDiffRecursive($_SESSION, $lastReadSession);
    if (!empty($diffToSet)) {
        $freshSession = array_replace_recursive($freshSession, $diffToSet);
    }

    return serialize($freshSession);
}

Merci.

colinmollenhour commented 1 week ago

I started on a proof of concept that is similar to @Yonn-Trimoreau 's approach. It uses MongoDb with zero locking but uses $set and $unset operations to only update the changed keys on write() so that different processes for the same session making changes to different keys (or no changes) will not block each other but rather perform a sort of sloppy merge. There is still some potential for race conditions, but unless there are some really hot keys, it should be greatly reduced and overall consistency should be better in theory. Mainly, two processes can read a key that one wants to delete, and both will see it, but it will still be deleted for the third process even though they both read the key and only one deleted it. The bad is that "exactly once" delivery is not guaranteed. It's not in a working state currently and would require overriding some core functionality in Magento, but I wanted to explore the idea and see how it could work. I'm not expecting to finish this any time soon or possibly ever, so if someone wants to take it over, please feel free. https://github.com/colinmollenhour/php-mongodb-lockless-session

It may also be possible to do something similar using Lua with Redis, although that would just put more strain on Redis which is single-threaded so may not scale well. MongoDb should be perfect for this, although you would need to call the gc() method for garbage cleanup.