nexcess / magento-turpentine

A Varnish extension for Magento.
GNU General Public License v2.0
519 stars 253 forks source link

___SID=U in URL #909

Open ADDISON74 opened 9 years ago

ADDISON74 commented 9 years ago

I am working with the latest dev version of Turpentine. All the URL's have at the end "___SID=U". I did not have this string in previous version, is it something changed? Can I avoid this "ugly" url?

Thank you.

csdougliss commented 9 years ago

Also noticed this - makes some of our footer links broken :)

BarryCarlyon commented 9 years ago

I've not seen this.

But I did also regenerate my vcl (leading to #904 ) and dump all of my varnish cache when I upgraded

csdougliss commented 9 years ago

Seems to be fixed locally by generating a VCL too

miguelbalparda commented 9 years ago

@ADDISON74 can you confirm?

BarryCarlyon commented 9 years ago

I thought a regen might fix it as there were enough changes in the VCL base configuration that on upgrade it's usually worth a regen (as always check the changelog)

I think on upgrade one should always regen the VCL, make sure all parts of the system are up to date (akin to clearing Magento cache(s) and reindexing where appropriate)

ADDISON74 commented 9 years ago

I confirm what craigcarnell said. The links in footer look awful.

http://IP_ADDRESS/?___SID=Uabout-magento-demo-store/">About Us http://IP_ADDRESS/?___SID=Ucontacts/">Contact Us http://IP_ADDRESS/?___SID=Ucustomer-service/">Customer Service http://IP_ADDRESS/?___SID=Uprivacy-policy-cookie-restriction-mode/">Privacy Policy

Clicking on these links it will load the same page.

miguelbalparda commented 9 years ago

Have you flushed the block cache and the Turp caches?

ADDISON74 commented 9 years ago

Of course, from Cache Management I pushed all buttons. I saved Varnish file on disk, I applied the configuration. I went to the disk and cleared all cache manually. I restarted Apache and Varnish services. Shall I do other things?

It was a mistake to overwrite my dev previous version (which worked just fine) with the latest one. But it is in a VMware box so no worries about. We should find what's generating this issue. I can reinstall the whole thing to see if the issue is present.

ADDISON74 commented 9 years ago

After 3 hours from setting up a new test environment, this issue appeared again. In my case I will go back to master branch and see what is happening. For the moment dev branch creates trouble. Sorry to report that. All links in footer are compromised as reported previously. Same for categories links.

miguelbalparda commented 9 years ago

There is no confirmation of that @ADDISON74 but only your case. I wouldn’t say devel branch has issues, but also keep in mind this is an ongoing development project where you can submit your own fixes.

ADDISON74 commented 9 years ago

I appreciate your work, but something wrong is happening. Please take in consideration it is a new issue. Why in the same dev-branch I did not encounter this issue before? My test environment was set up in May 2015, every time you updated the branch I downloaded the file. Two days ago I updated the files and boom, all links are busted.

Please take a look into this issue and leave it open. It is a new issue. See above how two other gentlemen found this issue, but lucky for them, they got a positive solution recreating the VCL file. If you want to check by your own, my website is online and I can give you in private details for connection.

miguelbalparda commented 9 years ago

@BarryCarlyon @craigcarnell can I have your VCLs please?

csdougliss commented 9 years ago

@ADDISON74 @miguelbalparda

http://www.dirt-devil.de/

If you hover over links (e.g. in the footer) you can see SID everywhere - http://www.dirt-devil.de/terms-and-conditions/?___SID=U

This is using Varnish 3.0. There is a store switcher with 2 languages. The SID link is not there straight away but appears shortly after.

csdougliss commented 9 years ago

@ADDISON74 @miguelbalparda Can confirm this bug is around

ADDISON74 commented 9 years ago

As you may see some categories are affected and links in footer. For links is footer is very strange, in Company column all links get inside the URL ___SID=U string. If is in the middle you will get the home page. If you change it from middle to the end the link works.

For example:

http://IP_ADDRESS/about-magento-demo-store/?___SID=U (this works)

http://IP_ADDRESS//?___SID=Uabout-magento-demo-store/ (this doesn't work - session id string is wrong placed)

csdougliss commented 9 years ago

@miguelbalparda Any progress with this issue? My concern is that as a side affect any URLS that have SID in them will return (pass);

kingdiablo commented 9 years ago

I also have this issue - sometimes its present, othertimes the issue goes away by itself. I'm using varnish-3.0.7 revision f544cd8 and standard turpentine latest

ceckoslab commented 9 years ago

I also got "___SID=U" in links while I was testing with different cookie domains once with "." and once without ".":

With one of the cookie domains the links were fine but with other not. I am still digging but I noticed this issue only when I used Turpentine fake session and generated cookie domain from Turpentine differed from declared cookie domain in Magento. The difference was that in Magento the cookie domain didn't contain leading "."

ceckoslab commented 8 years ago

I believe that this issue could be caused because of mismatch in cookie domain and cookie path between (fake Turpentine session cookie) and (Magento configuration).

I looked at 2 core methods in Magento 1.9.2.1:

_class Mage_Core_Model_Url::prepareSessionUrlWithParams()

    /**
     * Check and add session id to URL, session is obtained with parameters
     *
     * @param string $url
     * @param array $params
     *
     * @return Mage_Core_Model_Url
     */
    protected function _prepareSessionUrlWithParams($url, array $params)
    {
        if (!$this->getUseSession()) {
            return $this;
        }

        /** @var $session Mage_Core_Model_Session */
        $session = Mage::getSingleton('core/session', $params);

        $sessionId = $session->getSessionIdForHost($url);
        if (Mage::app()->getUseSessionVar() && !$sessionId) {
            $this->setQueryParam('___SID', $this->getSecure() ? 'S' : 'U'); // Secure/Unsecure
        } else if ($sessionId) {
            $this->setQueryParam($session->getSessionIdQueryParam(), $sessionId);
        }
        return $this;
    }

_class Mage_Core_Model_SessionAbstract::getSessionIdForHost()

    /**
     * If session cookie is not applicable due to host or path mismatch - add session id to query
     *
     * @param string $urlHost can be host or url
     * @return string {session_id_key}={session_id_encrypted}
     */
    public function getSessionIdForHost($urlHost)
    {
        if ($this->getSkipSessionIdFlag() === true) {
            return '';
        }

        $httpHost = Mage::app()->getFrontController()->getRequest()->getHttpHost();
        if (!$httpHost) {
            return '';
        }

        $urlHostArr = explode('/', $urlHost, 4);
        if (!empty($urlHostArr[2])) {
            $urlHost = $urlHostArr[2];
        }
        $urlPath = empty($urlHostArr[3]) ? '' : $urlHostArr[3];

        if (!isset(self::$_urlHostCache[$urlHost])) {
            $urlHostArr = explode(':', $urlHost);
            $urlHost = $urlHostArr[0];
            $sessionId = $httpHost !== $urlHost && !$this->isValidForHost($urlHost)
                ? $this->getEncryptedSessionId() : '';
            self::$_urlHostCache[$urlHost] = $sessionId;
        }

        return Mage::app()->getStore()->isAdmin() || $this->isValidForPath($urlPath) ? self::$_urlHostCache[$urlHost]
            : $this->getEncryptedSessionId();
    }

Seems that getSessionIdForHost() returns '' and this leads to the case when we get $this->setQueryParam('___SID', $this->getSecure() ? 'S' : 'U'); // Secure/Unsecure

ADDISON74 commented 8 years ago

I appreciate for your support in solving this issue. I still have it all the day, just restarting Varnish again when is happening. Only links in footer and categories get __SID in their URL's. Did you edit the cookie and see if it works?

ceckoslab commented 8 years ago

@ADDISON74 I noticed it while changing cookie domains but this was part of some other tests and I reverted my changes. I will try to reproduce it again and see what exactly Magento does inside the core.

Meanwhile could you tell us what is the cookie domain in the cookie in your browser and what is the cookie domain configured in Magento?

Also can you confirm that you get same result after you clean Varnish cache and force Varnish to "pipe" (aka. hit the backend) in order to get frontend cookie from Magento?

ADDISON74 commented 8 years ago

I am using a VM testing version. Magento It is based on IP not an FQDN. Cookie shows the IP. Hope Turpentine geeks will give us some clues to your posts today.

ADDISON74 commented 8 years ago

I did all the changes to have a FQDN server. Well I checked the cookie for domain. IT IS WITH A DOT BEFORE (e.g. .www.mytestdomain.tld)

Both section frontend and backend have the same cookie domain with a "." before.

When I am using only with my IP_ADDRESS, there is no dot in front of the IP.

aricwatson commented 8 years ago

I believe that this issue could be caused because of mismatch in cookie domain and cookie path between (fake Turpentine session cookie) and (Magento configuration).

Thank @ceckoslab for helping track this down!

If if it is caused by a mismatch, I wonder why it doesn't happen right away? Also, Turpentine now passes new sessions through Varnish to get a good session from Magento, so I'm not sure how a faked session could be causing this. I do have a theory, which is that it's being caused when a crawler is hitting the site and getting the crawler-session cookie - which is causing the error - which is then getting cached for public blocks such as the footer. I'll see if I can test this theory a bit tomorrow when I have more time.

ceckoslab commented 8 years ago

@aricwatson I reproduced the issue in my local setup:

Links in main navigation contained "___SID=U".

Basically the main navigation in Magento is cached and if the shop is visited first by crawler then we would have "___SID=U".

Seems that when we use frontend cookie "crawler-session" we get "SID=U" in cached urls. Probably also crawlers also see all generated urls containing "SID=U".

aricwatson commented 8 years ago

I've been unable to replicate this issue - including emulating google bot, etc. I've also tested both with the curent devel branch in a couple different environment, and the master branch for good measure. I've also played around with various permutations of cookie path/domain settings in Magento.

I'm wondering if some of you who are reliably encountering this can add some debugging around the code mentioned by @ceckoslab to see if that adds any illumination?

ADDISON74 commented 8 years ago

@aricwatson please check it the modules I am using in Apache and PHP are the same with yours? This a Debian Wheezy 7.9 x64 distribution.

Apache 2.2.22 Loaded Modules: core_module (static) log_config_module (static) logio_module (static) version_module (static) mpm_prefork_module (static) http_module (static) so_module (static) actions_module (shared) alias_module (shared) auth_basic_module (shared) auth_digest_module (shared) authn_file_module (shared) authz_default_module (shared) authz_groupfile_module (shared) authz_host_module (shared) authz_svn_module (shared) authz_user_module (shared) autoindex_module (shared) cgi_module (shared) dav_module (shared) dav_fs_module (shared) dav_svn_module (shared) dir_module (shared) env_module (shared) fcgid_module (shared) mime_module (shared) negotiation_module (shared) reqtimeout_module (shared) rewrite_module (shared) setenvif_module (shared) status_module (shared) suexec_module (shared)

PHP 5.5.30-1~dotdeb+7.1 [PHP Modules] bcmath bz2 calendar Core ctype curl date dba dom ereg exif fileinfo filter ftp gd gettext hash iconv intl json libxml mbstring mcrypt mhash mysql mysqli openssl pcntl pcre PDO pdo_mysql Phar posix readline Reflection session shmop SimpleXML soap sockets SPL standard sysvmsg sysvsem sysvshm tokenizer wddx xml xmlreader xmlwriter xsl Zend OPcache zip zlib

[Zend Modules] Zend OPcache

Second opinion is related to an option in backend System -> Configuration -> Web -> Session Validation Settings section: Use SID on Frontend is set by default to Yes. This Session ID should be enabled only if there are different stores. If you only have one store this option should be set to No. I don't know if Magento engineers evaluated having this option set to Yes for one Store enabled. If they didn't it could be a problem, be default Magento sample database comes with 4 Store Views. it is my personal opinion.

And why this issue only affects Navigation Links and Footer Links?

aricwatson commented 8 years ago

For what it's worth, here's my primary testing environment.

Ubuntu 14.04 VM

Apache 2.4.7

Loaded modules: core_module (static) so_module (static) watchdog_module (static) http_module (static) log_config_module (static) logio_module (static) version_module (static) unixd_module (static) access_compat_module (shared) alias_module (shared) auth_basic_module (shared) authn_core_module (shared) authn_file_module (shared) authz_core_module (shared) authz_host_module (shared) authz_user_module (shared) autoindex_module (shared) deflate_module (shared) dir_module (shared) env_module (shared) filter_module (shared) mime_module (shared) mpm_prefork_module (shared) negotiation_module (shared) php5_module (shared) rewrite_module (shared) setenvif_module (shared) status_module (shared)

PHP 5.5.9

Core date ereg libxml openssl pcre zlib bcmath bz2 calendar ctype dba dom hash fileinfo filter ftp gettext SPL iconv mbstring session posix Reflection standard shmop SimpleXML soap sockets Phar exif sysvmsg sysvsem sysvshm tokenizer wddx xml xmlreader xmlwriter zip apache2handler mysqlnd PDO curl gd intl json mcrypt ming mysql mysqli pdo_mysql pdo_sqlite pspell readline recode snmp sqlite3 tidy xmlrpc xsl mhash Zend OPcache xdebug

ADDISON74 commented 8 years ago

No issues after testing related to configuration of Apache or PHP. Immediately solutions I found to this issue are:

1) One solution is changing [Use SID in frontend] option in backend to No. But I did not evaluate the effects of this option in multi-store versions.

2) Other solution if #1 is not working is editing file: /app/code/core/Mage/Core/Model/App.php and changing protected $_useSessionInUrl from true to false.

3) We can use a RewriteRule directive to updated all URL having ___SID=U. It is only affecting navigation links (categories). A category like this one:

http://YourDomain.com/women/tops-blouses.html?___SID=U will become http://YourDomain.com/women/tops-blouses.html

I am not good at Rewrites to set up a rule for doing this task.

4) Enable for a while in Cache Management "Block HTML output" then Disable it. This is a temporary solution for getting rid off ___SID=U. This proves there are some issues to the blocks, probably cookies. Magento creates sessions especially for those blocks (Navigation Category and Footer).

ADDISON74 commented 8 years ago

In my opinion this is a session ID issue. Magento is dealing session ID using cookies or URL parameters. When cookies are creating trouble it is using URL. In my tests I got these two cookies (EditThisCookie plugin in Chrome) always.

Cookie domain and Cookie Path are correctly setup. So there is no need to set up other options in Backend or RewriteRules in .htaccess.

But sometimes two more cookies appears:

I am not so advanced understanding why the cookies above are coming in some scenarios, but I guess the first one is related to Magento cache, and the other one is the session stored inside a cookie.

There are some guys who are approaching external_no_cache in their Varnish configuration files for cookies. I did not find it in Turpentine. Here are a few examples:

http://serverfault.com/questions/367543/avoiding-varnish-hitting-magento-cookies-vcl https://www.varnish-cache.org/trac/wiki/VCLExampleRemovingSomeCookies https://github.com/PHOENIX-MEDIA/Magento-PageCache-powered-by-Varnish/blob/master/app/code/community/Phoenix/VarnishCache/Helper/Cache.php

Could you please investigate how cookies are protected in Turpentine? Navigation block and the Footer one could be affected by a wrong cookie configuration. I saw you set up Fake Sessions but are they also used for external_no_cache or not being different? My personal opinion is Turpentine has a different cookie than Magento in some cases and this creates trouble.

ADDISON74 commented 8 years ago

When PHPSESSID appears as cookie nothing bad is happening. Even having Block HTML output cache enabled. My only conclusion this issue appears as a cookie - session problem from how Turpentine deals with these two.

ADDISON74 commented 8 years ago

I will continue testing clearing var/session folder before starting Varnish. It seems something good is happening doing this. I guess there are residues of sessions left behind. My webserver is a VM and it is not always power on to give time for clearing sessions. I still remain to my conclusion this is an issue related to sessions in cookies and URL. ___SID=U means that connection is Unsecure (if S = Secure). And it comes from an IF statement see ceckoslab's code above.

PS - that dot before your Cookie domain is normal for Chrome. Check the same Cookie domain in Firefox. It is without dot. This is not the problem causing this issue.

ceckoslab commented 8 years ago

@ADDISON74 What I notice is that Varnish sends "crawler-session" cookie without cookie domain to backed for all recognised CRAWLERS. I assume that his what is causing the problem because so far I don't see any other logic in Magento core that could lead to this issue.

PS - Sorry that I don't step in but I am busy with my daily life tasks :)

ADDISON74 commented 8 years ago

@ceckoslab : hope someone will investigate and finding a solution. I am curious why when Enable Debug Info is set to Enable in Turpentine this issue is not active.

ADDISON74 commented 8 years ago

This issue appears from time to time. But when it appears all URLs having ___SID=U are loading as normal, skipping the cache. From 3 - 5 ms Waiting time it is increasing to normal 300 ms. This is very bad because when this issue becomes active all visitors will not take the benefits of Varnish cache.

The investigation should start why using Varnish cache URLs get ___SID=U (U=unsecure) as parameters?

ADDISON74 commented 8 years ago

This issue is very funny, but related to cookies. Please see this image. When all those cookies are appearing there is no ___SID in URLs. But see how two of them having www and other two having .www (dot before fqdn).

external_no_cache => .www.mydomain.com frontend => .www.mydomain.com

autvc => www.mydomain.com atuvs=> www.mydomain.com

untitled-2

ceckoslab commented 8 years ago

@ADDISON74 The . in frontend cookie comes from Magento frontend JS. You can always rely on what exactly you get in http response.

The logic that adds . is produced from this JS logic that you usually have inside HTML produced from Magento.

Example:

<script type="text/javascript">
//<![CDATA[
Mage.Cookies.path     = '/';
Mage.Cookies.domain   = '.www.example.com';
//]]>
</script>
ADDISON74 commented 8 years ago

An important observation, I set up Crawler IP to my server IP. If I get "crawler-session" value for frontend cookie, ___SID=U issue is not appearing at all. Very interesting. Please note for Home page, Categories and other links I do not get any cookie if I visit them many times (HTTP protocol). Once I am visiting a product I get 3 cookies, one is frontend. Sometimes I do not get frontend cookie at all.

Well here comes the nice part. When I am switching to HTTPS protocol accessing My Account link, I get immediately 2 cookies, one is frontend the other one is frontend_cid, both having an alphanumeric string value. If I am switching again to HTTP protocol accessing Home Page link or a Category link, frontend_cid cookie is deleted, but frontend remains with an its alphanumeric value. This is not good at all. We started with a crawler-session cookie in frontend and now it changes to a normal session value, just accessing a secured link? Really, Magento is acting great inserting that ___SID=U inside the link.

ADDISON74 commented 8 years ago

Second important observation.

I am not using full secured frontend, just only for My Account, Checkout areas:

Backend: System -> Configuration -> Web -> Unsecure -> Base URL = http://www.mydomain.com/ Backend: System -> Configuration -> Web -> Secure -> Base URL = https://www.mydomain.com/ Backend: System -> Configuration -> Web -> Use Secure URLs in Frontend = Yes (see difference between unsecure/secure values).

I am using Pound for SSL-Offloading. Well Pound has an option RewriteLocation which could create trouble in my opinion. For example you can set up RewriteLocation value to 0|1|2 in different scenarios for HTTP/HTTPS listeners. If you don't specify this directive it is set to 1 by default. For setting both protocols to 0, I get this result:

1 - Visit your frontend Home Page without any cookies set before, then 2 visit Category Pages (http) 2 - Visit a secure section like My Account (https) 3 - Switch to a visited category before (https => http). Look at the URL, it has ___SID=U. To get rid of it click again the category link.

Magento is acting as expected and this is not an issue coming from Magento core. When you switch from secured to unsecured frontend_cid cookie is deleted and Magento needs to set up an URL cookie. This is not happening when SSL is done on the webserver without Varnish in front of it. (In the above scenario Pound was in front of Varnish listening on both ports 80 and 443. If Varnish is set on port 80 and keeping Pound on port 443 it is not changing anything in good).

If your page has a mix of http/https URL and you switch from a protocol to other you will get this issue active. If Pound is set to RewriteLocation correctly you can get rid off the ___SID=U in URL after visiting other section.

ADDISON74 commented 8 years ago

I have some ideas related to Pound in front of Apache. Two directives in Pound which are not correctly set up will give you trouble. RewriteLocation and RewriteDirection. In ListenHTTP and ListenHTTPS values must be the same and changed from default which are (1 for RewriteLocation) and (0 for RewriteDestination).

After 2 days of testing the website with (RewriteLocation 2) and (RewriteDestination 1) this issue seems to be gone forever - ONLY FOR FRONTEND. Even many people recommend (RewriteLocation to 0) for Magento it is not a good idea. This issue is not present if you set up the whole frontend to secure. But if you keep your frontend unsecured and only some pages like My Account and Checkout secured you have to set correctly Pound.

This configuration is not working for Magento backend in one situation, but at least solved the frontend issue. That "one situation" is if you access backend directly (www.mydomain.com/admin) you won't be redirected to (https://www.mydomain.com/index.php/admin) as usual if your backend is secured and you will get a redirect loop. Just use (https://www.mydomain.com/admin) or (https://www.mydomain.com/index.php/admin) and you can access backend login form. I guess a redirect from (http//www.mydomain.com/admin) to (https://www.mydomain.com/index.php/admin) or (https://www.mydomain.com/admin) in Pound configuration to avoid this is mandatory (secured protocol MUST be inserted in URL in order to access backend).

As you all can see setting up SSL in this scenario like I did it was very challenging. Coming back with impressions in the next days and I hope everything will work as expected.

rabbit-aaron commented 8 years ago

Hi I am having the same issue here, I'm on a multi-store set up here, ?___SID=U seems to appear for all my store front. I'm on a full HTTPS set up, Nginx -> Varnish -> Nginx -> PHP5-FPM