[feature] Make `robots.txt` and `noindex` customizable

Right now, all GtS instances serve a simple hardcoded robots.txt that disallows all crawling:

User-agent: *
Disallow: /

The code for this is here: https://github.com/superseriousbusiness/gotosocial/blob/main/internal/api/security/robots.go

There are a couple problems with this though.

Firstly, this isn't actually enough to prevent sites from appearing in Google results, it just means that Google shows no information for that site. For example:

Screenshot from 2022-08-29 12-44-32

Here, Google still has the site indexed, it just hasn't crawled the page to gather information, leading to this 'stub' search result entry which is not particularly useful.

Secondly, some users and instances might actually want their profile or instance to be indexed by search engines, and by hardcoding this blanket rejection robots.txt, we're not allowing them that option.

Instead of serving this hardcoded robots.txt, we should allow instance admins and users to choose whether their stuff is indexable (and retain 'no indexing' as the default).

To do this, we should use targeted noindex meta tags instead: https://developers.google.com/search/docs/advanced/crawling/block-indexing. For users, we can use the 'discoverable' field of their account to decide whether to inject this header or not in web views of their pages and statuses.

For instance pages, we'll have to think of something else.

superseriousbusiness / gotosocial

[feature] Make `robots.txt` and `noindex` customizable #776