Open tsmethurst opened 2 years ago
Partially resolved by https://github.com/superseriousbusiness/gotosocial/pull/842 but we need a way for instance admins to set discoverable on the instance as a whole: using the Discoverable field of the instance account perhaps?
Right now, all GtS instances serve a simple hardcoded
robots.txt
that disallows all crawling:The code for this is here: https://github.com/superseriousbusiness/gotosocial/blob/main/internal/api/security/robots.go
There are a couple problems with this though.
Firstly, this isn't actually enough to prevent sites from appearing in Google results, it just means that Google shows no information for that site. For example:
Here, Google still has the site indexed, it just hasn't crawled the page to gather information, leading to this 'stub' search result entry which is not particularly useful.
Secondly, some users and instances might actually want their profile or instance to be indexed by search engines, and by hardcoding this blanket rejection robots.txt, we're not allowing them that option.
Instead of serving this hardcoded robots.txt, we should allow instance admins and users to choose whether their stuff is indexable (and retain 'no indexing' as the default).
To do this, we should use targeted
noindex
meta tags instead: https://developers.google.com/search/docs/advanced/crawling/block-indexing. For users, we can use the 'discoverable' field of their account to decide whether to inject this header or not in web views of their pages and statuses.For instance pages, we'll have to think of something else.