logstash-plugins / logstash-filter-geoip

Apache License 2.0
64 stars 80 forks source link

Formalize the update process of the geoip2 database #95

Open ph opened 8 years ago

ph commented 8 years ago

@peterskim12 brought this article to our attention http://arstechnica.com/tech-policy/2016/08/kansas-couple-sues-ip-mapping-firm-for-turning-their-life-into-a-digital-hell/

We have no process in place to frequently update the database of the geop2 filter, we usually update it when we make a change to the plugin and the test fails because of a sha1 mismatch.

Even if users can easily update their IP database, we might want to have an automatic process to update it with maxmind's update.

jordansissel commented 8 years ago

The reason I chose (years ago) to not automatically update geoip data is because maxmind has some draconian policies which blackhole you if you download from them more frequently than, in my memory, about once a week.

For users wanting frequent updates, they should subscribe to maxmind.

As an alternative, we could stop shipping the geoip database and require users to manually provide it. I don't like this option much, though.

On Thursday, August 11, 2016, Pier-Hugues Pellerin notifications@github.com wrote:

@peterskim12 https://github.com/peterskim12 brought this article to our attention http://arstechnica.com/tech-policy/2016/08/kansas-couple- sues-ip-mapping-firm-for-turning-their-life-into-a-digital-hell/

We have no process in place to frequently update the database of the geop2 filter, we usually update it when we make a change to the plugin and the test fails because of a sha1 mismatch.

Even if users can easily update their IP database, we might want to have an automatic process to update it with maxmind's update.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/logstash-plugins/logstash-filter-geoip/issues/95, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIC6ox9eyBNDr1YaNO_1l0hUsNoKAU4ks5qeyImgaJpZM4JiHTi .

joewreschnig commented 7 years ago

The downside of including the data is that the plugin maintainers would have to keep the file up-to-date. This is annoying, but the current approach still involves the plugin maintainers keeping the SHA-1 up-to-date. That's nearly as annoying for the maintainers, and I think much more annoying for people who want to submit patches but don't want the CI to fail for unrelated reasons. As the latter, I'd prefer to see a copy of the database included directly in the repository.

I think most serious users probably have a MaxMind subscription they're using instead of the GeoLite DB anyway, and most non-serious users won't notice the file being out-of-date. MaxMind's own GeoIP auto-update also works for the Lite DB, and it's easy for administrators to configure on their own.

joewreschnig commented 7 years ago

The Lite DB seems to be on a monthly update schedule, I believe as it's managed now this means the build is going to break every month.

jordansissel commented 7 years ago

Updating the download checksum is a necessary step, yes. I personally do not find it annoying, but I see how others may.

Including the file in the repo is not something I agree to because it is a 20 meg file that makes the repo balloon in size, making it really annoying to clone and operate on.

On Thu, Apr 6, 2017 at 3:21 AM Joe Wreschnig notifications@github.com wrote:

The Lite DB seems to be on a monthly update schedule, I believe as it's managed now this means the build is going to break every month.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/logstash-plugins/logstash-filter-geoip/issues/95#issuecomment-292131722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIC6nfKXokBbHpuUBU5dpgjnZSq_I6Dks5rtLyhgaJpZM4JiHTi .

joewreschnig commented 7 years ago

What can you usefully do with the repo without the DB though? You need it to run the tests, build the gem, and run the plugin with its default configuration. It seems like it's just a question of whether you pay the 20MB at git time or at rake time.

Probably my least favorite thing when trying to contribute to a project is if the default build instructions don't work, or the CI fails for a reason unrelated to my pull request. I suspect a lot of people feel the same way. Right now that's guaranteed to happen monthly - what's even the point of CI checks, at that point?