nexcess / magento-turpentine

A Varnish extension for Magento.
GNU General Public License v2.0
519 stars 253 forks source link

New users have crawlers SID #552

Closed skolodyazhnyy closed 9 years ago

skolodyazhnyy commented 10 years ago

Hey,

I got a problem with user session ID. When new user comes to website his SID gets "crawler-session" value. In result some of users get shared shopping cart. After quick look into: https://github.com/nexcess/magento-turpentine/blob/master/app/code/community/Nexcessnet/Turpentine/misc/version-3.vcl

I found this:

        # no frontend cookie was sent to us
        if (req.http.Cookie !~ "frontend=") {
            if (client.ip ~ crawler_acl ||
                    req.http.User-Agent ~ "^(?:{{crawler_user_agent_regex}})$") {
                # it's a crawler, give it a fake cookie
                set req.http.Cookie = "frontend=crawler-session";
            } else {
                # it's a real user, make up a new session for them
                call generate_session;
            }
        }

It's a new user and browser cookie frontend is not set, somehow client.ip is 127.0.0.1, which is in crawler_acl, so user gets crawlers SID.

I can't really get, why client.ip is 127.0.0.1 (varnish makes requests to itself?), and why there is || between conditions. For me && makes muuuch more sence. Also, why such logic lives here? Isn't it easier to put cookie management to crawler it self.

Thanks in advance

EvanRijn commented 10 years ago

We have had the same issues, we resolved the issues by changing the version-3.vcl to From:

it's a crawler, give it a fake cookie

            set req.http.Cookie = "frontend=crawler-session";

To:

it's a crawler, give it a fake cookie

            # set req.http.Cookie = "frontend=crawler-session";
            # Give all crawler real session
            call generate_session;
melvyn-sopacua commented 10 years ago

@skolodyazhnyy Did you verify that 127.0.0.1 was the actual cause of those clients falling through? I agree that any variation of localhost or "this machine's IP" shouldn't be in crawler_acl or at least configurable (in case you find it necessary to run your cache warmer on the same host as varnish).

But I highly doubt that this is why those clients get the crawler-session. The OR logic is easy to understand:

1) some crawlers have known IP ranges 2) some crawlers do not have known IP ranges

The 2nd group can be identified by looking at their user agent string. They SHOULD (in the RFC sense) have some indication that they are automated software, instead of user controlled browsers. I suspect {{crawler_user_agent_regex}} to be too relaxed and generate false positives. The default seems clean enough, though. With those default settings (/app/code/community/Nexcessnet/Turpentine/etc/config.xml:crawler_user_agents) I don't see a way it would fall through these conditions, so there must be custom settings at storeconfig level at play.

skolodyazhnyy commented 10 years ago

Thanks for your answer!

Did you verify that 127.0.0.1 was the actual cause of those clients falling through?

I didn't investigate it, I thought it could be Varnish ESI requests.

The OR logic is easy to understand: 1) some crawlers have known IP ranges 2) some crawlers do not have known IP ranges

It makes sense. Although, as for me, first group should has user-agent verification as well. What do you think about simplifying this condition to just user agents?

if (crawler_user_agent_regex && req.http.User-Agent ~ "^(?:{{crawler_user_agent_regex}}|turpentine-crawler)$") {