Open sprankhub opened 6 years ago
Maybe it's time to update the user agents we are checking? From here it seems Googlebot might not be enough for every Google related bot.
Thanks for your answer @miguelbalparda!
I just updated our crawler user agents list to:
ApacheBench/.*,.*Googlebot.*,.*APIs-Google.*,.*Mediapartners-Google.*,.*AdsBot-Google.*,JoeDog/.*,.*Siege/.*,magespeedtest\.com,Nexcessnet_Turpentine/.*,.*PTST.*,.*Symfony BrowserKit.*
This should definitely catch all Google bots. However, based on the amount of sessions created, I think the current implementation simply does not work properly. Especially, because I already excluded .*Googlebot.*
, but still find session files under var/session
containing "Googlebot".
Any other idea @miguelbalparda?
Unfortunately, the updated crawler user agents list did not help.
In the meantime, I tried to debug the issue on a clean test environment (only Magento 1.9.3.9 with sample data and Turpentine 0.7.4). I could reproduce that multiple sessions are generated if the site is opened with a crawler user agent. Two ideas (just guesses until now):
crawler-session
is correctly set for the initial request, but ESI requests lead to real user sessions (the fake crawler session does not work for ESI requests).frontend
cookie is correctly handled/faked by Turpentine, but the newer frontend_cid
cookie is ignored, which leads to the additional sessions.If you @miguelbalparda or anyone else have any input, I am more than thankful.
I can confirm that idea 1 is the issue. ESI requests each lead to a real user session - the fake crawler session does not work for ESI requests. Even though the fake frontend cookie IS added to the Magento request:
ReqHeader Cookie: frontend=crawler-session
It is ignored by Magento:
RespHeader X-Varnish-Set-Cookie: frontend=67004cf0ahigetv3t9dglqm5u4; expires=Tue, 25-Sep-2018 08:52:49 GMT; Max-Age=86400; path=/; domain=www.shop.com; HttpOnly
Any idea?
@sprankhub I know this issue is a bit older, but did you perhaps find a solution in the meantime? Because I am experiencing the exact same issue in one of my projects..
No, unfortunately not, @christophmassmann :-(
Alright, thanks for your feedback anyhow, @sprankhub! I have debugged this a little bit and it seems that at the moment there is no logic to prohibit the bots generating new sessions within each ESI request. So probably I will just implement this directly in Magento..
We encountered a huge amount of sessions at a customer's shop and analysed where they come from:
grep -Rl 'Googlebot' | wc -l
undervar/session
)We have a pretty much standard Turpentine setup without any major customisations. We use Apache as our backend server and nginx for SSL offloading. We use Varnish 4.1 and Turpentine 0.7.3. Here is our VCL:
If I understand correctly, the following part should prevent the session generation for all known crawlers - all crawlers should get a dummy
crawler-session
:However, sessions are still created for crawlers, which leads to various issues. Did anyone encounter this behaviour and knows how to fix it?