mozilla / firefox-hardware-report

The Firefox Hardware Report was the precursor to the Firefox Public Data Report, to which it now redirects
https://data.firefox.com/dashboard/hardware
28 stars 13 forks source link

Provide a description to show up in search engine results #12

Closed Dexterp37 closed 7 years ago

Dexterp37 commented 7 years ago

We should probably add a description or provide a less restrictive robots.txt so that the major search engines show something better than what's in the image below:

hw_report_searchengine

almossawi commented 7 years ago

@mpressman Is this something you might be able to help with?

openjck commented 7 years ago

It turns out that, although the Firefox Hardware Report itself does not have a robots.txt, there is a top-level robots.txt for metrics.mozilla.org that disallows all:

https://metrics.mozilla.com/robots.txt

If that were removed, the Hardware Reprort would show up in search engines without issue. But we may not want to do that because other sub-sites may still want search engines to be disallowed.

hcrince commented 7 years ago

Thanks for checking on that John. We use metrics.mozilla.com for both private and public dashboards. if we change the top-level robots.txt would that expose us in anyway pertaining to the private dashboards on metrics.services; that search engines could index those private dashboards (even tho they aren't accessible w/o LDAP) but I wouldn't want those to be start showing up in search results somewhere.

openjck commented 7 years ago

Removing the top-level robots.txt would make those other sites indexable. Anything behind password protection would still be protected from search engines, but their URLs and any public content could show up.

I would recommend continuing to protect those sites while making the Hardware Report available to search engines. We could do that in two ways: either by updating the top-level robots.txt to manually list the patterns of URLs we want to protect, or by having one catch-all robots.txt in each of the protected directories.

I don't have access to do either, so this will have to be something for @mpressman or someone else in ops.

openjck commented 7 years ago

As far as this description itself, this is what we have now. There's no guarantee that search engines will use this description, but they will treat it as a hint.

The Firefox Hardware Report is a public weekly report of the hardware used by a representative sample of the population from Firefox's release channel on desktop. This information can be used by developers to improve the Firefox experience for users.

Dexterp37 commented 7 years ago

My 2 cents here: wouldn't it be possible to simply add an "Allow" directive in the top level robots.txt file to whitelist the hardware-report after the "Disallow"?

openjck commented 7 years ago

My 2 cents here: wouldn't it be possible to simply add an "Allow" directive in the top level robots.txt file to whitelist the hardware-report after the "Disallow"?

That would probably be fine. If I understand correctly, though, not all crawlers treat Allow the same way.

http://stackoverflow.com/a/18280238

hcrince commented 7 years ago

I have submitted bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1360744 to Webops team to address the issue.

openjck commented 7 years ago

This is all set; the site became index-able when it moved to a subdomain.

It might take some time for the site to appear for natural search queries (like "firefox hardware stats" and the like) but you can see that the results do appear when you search the URL manually.

openjck commented 7 years ago

I've also opened bug 1373461 to see if we can speed up that indexing process by making the redirect from the old URL to the new URL a 301 rather than a 302.