ualbertalib / jupiter

Jupiter is a University of Alberta Libraries-based initiative to create a sustainable and extensible digital asset management system. This is phase 2 (Digitization).
https://era.library.ualberta.ca/
MIT License
23 stars 10 forks source link

Nefarious Bots bloating the logs #1456

Open pgwillia opened 4 years ago

pgwillia commented 4 years ago

@henryzhang87 says:

I am dealing with inflated production.log file. It was set to rotate weekly. For the current week, it reached 3.5GB yesterday, and I forced it to rotate yesterday even though it is still not a full week yet. After that in less than 24 hours, it is now 720MB already.

I started wondering what caused the spurious growth. I found that whenever a crawler tries to access GET /catalog....., it trigger a fatal error with 39 lines of debug code of it because the item doesn't exist (as shown in the attached snapshot)

So far, we have 23604 accesses to "/catalog....". Just wondering if it is necessary to log this repetitive, large volume of data which becomes unwieldy when we try to extract meaningful information from the log file while troubleshooting

Thanks for your attention and your input is highly appreciated

image

pgwillia commented 4 years ago

In slack we discussed using fail2ban.

Re: changing the log level. The example given is level FATAL -- so this is something that the development team needs to address. Changing the log level won't remove these errors.

We want at least INFO so that we can see regular traffic and trends. In Rails 4.2 the default was changed to DEBUG because if something fails we want to retro-actively be able to see what happened.

Rollbar has seen this error a lot and has this suggestion https://rollbar.com/blog/top-10-ruby-on-rails-errors/#1-actioncontrollerroutingerror

If you aren’t interested in logging 404 errors caused by ActionController::RoutingError then you can avoid them by setting a catch all route and serving the 404 yourself. This method is suggested by the lograge project. To do so, add the following at the bottom of your config/routes.rb file:

Rails.application.routes.draw do
# all your other routes
match '*unmatched', to: 'application#route_not_found', via: :all
end

Then add the route_not_found method to your ApplicationController:

class ApplicationController < ActionController::Base
protect_from_forgery with: :exception
def route_not_found
render file: Rails.public_path.join('404.html'), status: :not_found, layout: false
end
end

Before implementing this, you should consider whether knowing about 404 errors is important to you. You should also keep in mind that any route or engine that is mounted after the application loads won’t be reachable as they will be caught by the catch all route.