splitice commented 8 years ago

When the database is hosted on a NFS share it is possible for the handle to become stale and the database to need to be re-opened.

Currently this just results in an avelanche of errors

:message=>"Unknown error while looking up GeoIP data", :exception=>#<IOError: Stale NFS file handle>

At a minimum the Geoip database should be closed (to be re-opened) in case of error.

markwalkom commented 8 years ago

Out of interest, why are you storing it on an NFS share?

splitice commented 8 years ago

So that the 7 logstash servers all: a) Have the same version of the database b) Can be easily updated at the same time

markwalkom commented 8 years ago

The are many issues with doing this though. Do we add some kind of heartbeat to check the file exists if we aren't using it during processing? How do we deal with the rest of the pipeline while we retry access to the file if it disappears? What if it never returns?

The better option, in my opinion that is, would be to use automated deployment tools, puppet/chef/ansible/salt/etc to ensure it is consistent, and not have to worry about NFS at all.

Maybe someone else can comment with their thoughts as well :)

splitice commented 8 years ago

Unfortunately as we have other distributed services running on each logstash machine that dont fit well with any sort of adjustment due to a lack of add/remove from the distributed database, hence we cant currently use that workflow.

NFS has worked well for quite a while until we added GeoIP. The IO is minimal, and very easy to deploy and manage.

I would be quite happy to have GeoIP skipped or even a cool down if the Database cant be opened. Right now that message is spewed out forever at a rate of multiple GB/min in our case. Filling the log storage.

splitice commented 8 years ago

I think a cooldown would be a good idea, but given the current handling I still feel this simple solution is still a significant step forward.

1) It produces no worse handling than the current implementation 2) It produces better handling for a bunch of file related issues, expecially on remote (nfs, sshfs, etc) or distributed filesystems (glusterfs)

I have not tested the commit yet, I'll need to figure out how to stale a filehandle. Its simple though.

https://github.com/splitice/logstash-filter-geoip/commit/6d58ea78f7d91b25264b627896147b9381864bea

splitice commented 8 years ago

41 could also be useful

jordansissel commented 8 years ago

I don't' think we really expect the file to be changed while logstash is running, so if that's what you're doing, as a workaround for now, you may need to restart Logstash in order to update your geoip file.

I strongly discourage NFS due to it's behavioral problems. However, in this case, maybe we can catch the specific "IOError: Stale NFS file handle" error and try to reopen the file in specifically that case (not all IOErrors, just stale file handle one)

splitice commented 8 years ago

I have never mentioned changing the file while running. I too am unsure if that would work.

No other component (logstash-core etc) has any trouble with NFS, we store logstash, our plugins and its configuration in this way.

splitice commented 8 years ago

And I am not certain how to get the errno from an IOError in Ruby. Its not a language I am overly familiar with.

ebuildy commented 8 years ago

Did you try the preload option?

Check at https://github.com/logstash-plugins/logstash-filter-geoip/pull/63/files.

splitice commented 8 years ago

No I havent.

I ended up setting up a shell script to rsync the file from NFS to temporary in-memory storage. And load it from there.

logstash-plugins / logstash-filter-geoip

Reopen database on errors #60

41 could also be useful