jonge-democraten / website

JD website
https://jongedemocraten.nl
MIT License
6 stars 2 forks source link

Do not index some pages (author decides) #48

Closed Pi2048 closed 9 years ago

Pi2048 commented 9 years ago

This seems to be our best bet, but I haven't been able to get it to run yet. Also not sure whether it will allow you to exclude easily.

What I would like is for robots.txt to be generated automatically based on the metadata setting 'Show in sitemap'. If I choose not to display a Displayable or Page in the Sitemap, it should be automatically added to robots.txt as Disallow. I don't think the app I link to does that automatically.

Pi2048 commented 9 years ago

I have questions about this issue/requirement. I don't think we offer this functionality at our current website, as is apparent from our current robots.txt: http://jongedemocraten.nl/robots.txt.

  1. If we do not offer it now, does anybody actually want this functionality?
  2. If we do not offer it now, should we be including it in our initial requirements?

It seems like we could get away with a hardcoded list of views that crawlers should not search.

Pi2048 commented 9 years ago

Please see commit message in commit cd0c5d4f80e7eeb4f9f7cfeb345c43b17930895c. I feel this commit solves this issue, but I welcome alternative views. I will leave the issue open until we have discussed it.

Note that the 'author decides' in the title of the issue is not offered in the current solution.

Pi2048 commented 9 years ago

After coordination with @bashazeborg, we decided to downgrade this functional requirement, because:

We now have the requirement that the administrators should be able to change this. The idea is that it is mostly used for technical reasons (do not index search results, etc).

Pi2048 commented 9 years ago

After this downgrade, the issue is now indeed resolved. See the demo data (Admin interface -> Robots). Check the settings on jd.local:8000/robots.txt.