vivo-project / VIVO

VIVO is an extensible semantic web application for research discovery and showcasing scholarly work
http://vivoweb.org
BSD 3-Clause "New" or "Revised" License
202 stars 127 forks source link

Disallow forms (with CAPTCHA) to bots #3936

Closed chenejac closed 3 weeks ago

chenejac commented 5 months ago

VIVO GitHub issue: 3935

Linked Vitro PR

What does this pull request do?

Disallow access to /contact and /forgot-password to bots (at least to bots which respect robots.txt)

What's new?

robots.txt is updated

How should this be tested?

Run VIVO and try to access to /contact and /forgotPassword from the web browser (this should work), and then testing robots.txt file by using some validator such as this one. Please note that you run VIVO at some public address as a root application (meaning it should not be http://somedomain.com/vivo, it should be http://somedomain.com)

Interested parties

Tag (@ mention) interested parties or, if unsure, @VIVO-project/vivo-committers

milospp commented 5 months ago

Works as intended. But keep in mind that the instructions in robots.txt files cannot enforce crawler behavior to the site, just suggest it 😄 to stop crawlers from accessing the pages we can try to manually whitelist user agents on the backend but I think it is not needed.

I don't think we should manually detect robots and disable those pages, because we cannot be 100% sure who is robot and who is not just by header. That's why we have captcha on those pages.