Closed chenejac closed 3 weeks ago
Works as intended. But keep in mind that the instructions in robots.txt files cannot enforce crawler behavior to the site, just suggest it 😄 to stop crawlers from accessing the pages we can try to manually whitelist user agents on the backend but I think it is not needed.
I don't think we should manually detect robots and disable those pages, because we cannot be 100% sure who is robot and who is not just by header. That's why we have captcha on those pages.
VIVO GitHub issue: 3935
Linked Vitro PR
What does this pull request do?
Disallow access to /contact and /forgot-password to bots (at least to bots which respect robots.txt)
What's new?
robots.txt is updated
How should this be tested?
Run VIVO and try to access to /contact and /forgotPassword from the web browser (this should work), and then testing robots.txt file by using some validator such as this one. Please note that you run VIVO at some public address as a root application (meaning it should not be http://somedomain.com/vivo, it should be http://somedomain.com)
Interested parties
Tag (@ mention) interested parties or, if unsure, @VIVO-project/vivo-committers