Open DpoBoceka opened 5 years ago
Hey @DpoBoceka, seems like a reasonable addition.
I'm on it.
I think, on an advanced stage of implementing this we should have some sort of cahe_size
like they use in Logstash, because it would be a waste to lookup some addresses every time all over again. Or perhaps, linux filesystem's cache would manage that and no overhead occurred.
Any word of advise?
Don't worry for now, eventually we can add a cache
field to optionally point to a cache resource.
Do you think this is going to make its way into benthos? Is there any work I can help with?
Hey @jamesharr, my plan was to adapt the processor from the existing PR into a bloblang method as it'd make it easier to compose but it's taking me a while to get around to it. If you're interested in having a go that'd be awesome, just let me know if I can help.
Hello Jeffail, I'm struggling to get started with this one. I took a wrong turn somewhere learning the code-base and I think I need to set it down for a little bit and pick it up again.
What all do I need to do create a bloblang method? Is there a good example I can base some work off of?
In part, it's been a long time since I've written Go, but I also think my lack of Benthos experience probably isn't helping here. Any pointers would be helpful, thanks!
Hi @Jeffail,
So I have a "hello world" bloblang functioning, but not anything super useful at the moment.
I'm wondering a few things:
On the API topic, which makes more sense to you?
root.geo_city = this.ip_address.geoip_city()
root.geo_city.country.iso_code // == "US"
root.geo_city.country.name // == 'United States'
root.geo_city.city.name // == "Minneapolis"
// other fields as noted in https://github.com/maxmind/GeoIP2-python#city-database
root.geo_asn = this.ip_address.geoip_asn()
root.geo_asn.autonomous_system_number // == "1211"
root.geo_asn.autonomous_system_organization // == "Telstra Pty Ltd"
or how about this API?
root.geo_city = geoip_city(this.ip_address)
root.geo_asn = geoip_asn(this.ip_address)
hey @jamesharr, I would suggest taking a string argument for a file path. The constructor of a bloblang function/method gets called only once when the value is static, so in the case of something like foo.bar("baz")
the method bar
is only created once and called many times, so you can simply read the file and not worry about caching the result or anything, similar to the file function: https://github.com/Jeffail/benthos/blob/master/internal/bloblang/query/functions.go#L320
And I think we ought to go with the method approach as it generally looks cleaner when put at the end of a long coersion/coalesce chain:
root.foo = this.(bar | baz).string().trim().geoip_city(path: "./something/db.zip")
In my opinion looks cleaner than:
root.foo = geoip_city(ip_address: this.(bar | baz).string().trim(), path: "./something/db.zip")
Having said all that, there's a few caveats that ought to be addressed, I'll take care of these myself afterwards just noting here for future reference:
file
, env
, etc) and replace them with placeholders (since we only care that the mapping is valid)root.foo = this.bar.geoip_city(path: this.baz)
) then there's no limit to how many files will be opened which in this particular case is a bit of a footgun. We should lock this method down so that arguments must be static in order for it to parse.Here's my first-pass at getting a .geo_city
structure.
https://github.com/Jeffail/benthos/pull/866/files
It seems to work so far, but it's missing a lot of polish. A few questions...
struct2map
tool to convert the structures to map[string]interface{}
before it returns. Is that the correct approach?Blobl example:
root = this
let geoip_data = this.ip.geoip_city(path: "GeoLite2-City.mmdb")
root.geoip_data = $geoip_data
root.city_name = $geoip_data.City.Names.en # this always returns null
Output (for 2001:4860:4860::8844
/ dns.google
)
{
"geoip_data": {
"City": {
"GeoNameID": 0,
"Names": null
},
"Continent": {
"Code": "NA",
"GeoNameID": 6255149,
"Names": {
"de": "Nordamerika",
"en": "North America",
"es": "Norteamérica",
"fr": "Amérique du Nord",
"ja": "北アメリカ",
"pt-BR": "América do Norte",
"ru": "Северная Америка",
"zh-CN": "北美洲"
}
},
"Country": {
"GeoNameID": 6252001,
"IsInEuropeanUnion": false,
"IsoCode": "US",
"Names": {
"de": "USA",
"en": "United States",
"es": "Estados Unidos",
"fr": "États-Unis",
"ja": "アメリカ合衆国",
"pt-BR": "Estados Unidos",
"ru": "США",
"zh-CN": "美国"
}
},
"Location": {
"AccuracyRadius": 100,
"Latitude": 37.751,
"Longitude": -97.822,
"MetroCode": 0,
"TimeZone": "America/Chicago"
},
"Postal": {
"Code": ""
},
"RegisteredCountry": {
"GeoNameID": 6252001,
"IsInEuropeanUnion": false,
"IsoCode": "US",
"Names": {
"de": "USA",
"en": "United States",
"es": "Estados Unidos",
"fr": "États-Unis",
"ja": "アメリカ合衆国",
"pt-BR": "Estados Unidos",
"ru": "США",
"zh-CN": "美国"
}
},
"RepresentedCountry": {
"GeoNameID": 0,
"IsInEuropeanUnion": false,
"IsoCode": "",
"Names": null,
"Type": ""
},
"Subdivisions": null,
"Traits": {
"IsAnonymousProxy": false,
"IsSatelliteProvider": false
}
},
"ip": "2001:4860:4860::8844"
}
Sometimes, if we have IP addresses in our messages (especially if we are triaging web-server's logs) we want them to be enriched with geoip database, like this one:
And here is a reader to it:
What do you think, should we expand benthos with such functionality? But of course, we are able to insert all that data into some cache or sql and utilise processors which we already have, but that would be more of a workaround. Implementing this would mean another point's taken from a logstash.