sul-dlss / earthworks

Geospatial discovery application for Stanford University Libraries.
https://earthworks.stanford.edu
Other
21 stars 3 forks source link

install robots.txt on geoservers #424

Closed drh-stanford closed 2 months ago

drh-stanford commented 6 years ago

We're getting bot traffic on our GeoServers. For example:

66.249.79.135 - - [15/Feb/2018:10:42:07 -0800] "GET /geoserver/gwc/rest/seed/druid:by617mn8672 HTTP/1.1" 401 1073 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
thatbudakguy commented 3 months ago

Needs investigation to see if this is still relevant/an issue. I think robots.txt can be managed via puppet.

edsu commented 2 months ago

We are definitely still seeing bot traffic. In the last 7 hours we've seen:

Googlebot: 1,983 Bingbot: 494 GPTbot: 4

I'm proceeding with the understanding that we can block all crawlers from Geoserver, since people will be getting that content to index by viewing Earthworks, SearchWorks and PURL.

edsu commented 2 months ago

New robots.txt files are in place on:

The robots.txt contains:

User-Agent: *
Disallow: /