veliovgroup / jazeee-meteor-spiderable

Fork of Meteor Spiderable with longer timeout, caching, better server handling
https://atmospherejs.com/jazeee/spiderable-longer-timeout
33 stars 9 forks source link

Fixed userAgentRegExps #46

Closed RickvdP closed 7 years ago

RickvdP commented 7 years ago

The previous regexes do not work for many of the bots.

For example the Google-Structured-Data-Testing-Tool sends a header value of Mozilla/5.0 (compatible; Google-Structured-Data-Testing-Tool +https://search.google.com/structured-data/testing-tool, which will not match the /^google-structured-data-testing-tool/i regex, as the ^ indicates that the regex will only match if the header begins with google-structured-data-testing-tool, which it obviously doesn't.

By removing the ^ operator from all RegEx patterns, the regex will match more loosely (and correctly) if the sent user agent header contains (one of) the strings from the User Agent RegEx array.

I've fixed it locally by overriding the userAgentRegExps like below, but I'd like to make sure nobody else is getting frustrated by Spiderable not serving the pre-rendered versions to some user agents :).

Spiderable.userAgentRegExps = [
    /facebookExternalHit/i,
    /linkedinBot/i,
    /twitterBot/i,
    /googleBot/i,
    /bingBot/i,
    /yandex/i,
    /google-structured-data-testing-tool/i,
    /yahoo/i,
    /MJ12Bot/i,
    /tweetmemeBot/i,
    /baiduSpider/i,
    /Mail\.RU_Bot/i,
    /ahrefsBot/i,
    /SiteLockSpider/i
];