Closed Dexterp37 closed 7 years ago
@mpressman Is this something you might be able to help with?
It turns out that, although the Firefox Hardware Report itself does not have a robots.txt, there is a top-level robots.txt for metrics.mozilla.org that disallows all:
https://metrics.mozilla.com/robots.txt
If that were removed, the Hardware Reprort would show up in search engines without issue. But we may not want to do that because other sub-sites may still want search engines to be disallowed.
Thanks for checking on that John. We use metrics.mozilla.com for both private and public dashboards. if we change the top-level robots.txt would that expose us in anyway pertaining to the private dashboards on metrics.services; that search engines could index those private dashboards (even tho they aren't accessible w/o LDAP) but I wouldn't want those to be start showing up in search results somewhere.
Removing the top-level robots.txt would make those other sites indexable. Anything behind password protection would still be protected from search engines, but their URLs and any public content could show up.
I would recommend continuing to protect those sites while making the Hardware Report available to search engines. We could do that in two ways: either by updating the top-level robots.txt to manually list the patterns of URLs we want to protect, or by having one catch-all robots.txt in each of the protected directories.
I don't have access to do either, so this will have to be something for @mpressman or someone else in ops.
As far as this description itself, this is what we have now. There's no guarantee that search engines will use this description, but they will treat it as a hint.
The Firefox Hardware Report is a public weekly report of the hardware used by a representative sample of the population from Firefox's release channel on desktop. This information can be used by developers to improve the Firefox experience for users.
My 2 cents here: wouldn't it be possible to simply add an "Allow" directive in the top level robots.txt file to whitelist the hardware-report after the "Disallow"?
My 2 cents here: wouldn't it be possible to simply add an "Allow" directive in the top level robots.txt file to whitelist the hardware-report after the "Disallow"?
That would probably be fine. If I understand correctly, though, not all crawlers treat Allow the same way.
I have submitted bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1360744 to Webops team to address the issue.
This is all set; the site became index-able when it moved to a subdomain.
It might take some time for the site to appear for natural search queries (like "firefox hardware stats" and the like) but you can see that the results do appear when you search the URL manually.
I've also opened bug 1373461 to see if we can speed up that indexing process by making the redirect from the old URL to the new URL a 301 rather than a 302.
We should probably add a description or provide a less restrictive robots.txt so that the major search engines show something better than what's in the image below: