leev / ngx_http_geoip2_module

Nginx GeoIP2 module
BSD 2-Clause "Simplified" License
983 stars 186 forks source link

Add uriencode featureflag to encode values set? #125

Open erikbos opened 10 months ago

erikbos commented 10 months ago

We're using the geoip2 module to do ip lookups from a Maxmind's database and insert this in requests going upstream.

Maxmind recently started introducing more non-ASCII contents in their dataset. See https://dev.maxmind.com/geoip/release-notes/2023#more-non-ascii-characters-in-english-place-names-in-geoip-products-and-services

The geo module takes Maxmind's data base values, UTF-8 strings, and inserts them as header(s) as configured. That works well in nginx but it can cause upstream issues: some web frameworks are strict about string encoding and do not like UTF-8 in an http header value.

An example: the value for Japanese city Ōbu gets encoded as string header value "C5 8C 62 75". In ISO8859 / Latin1 range 0x80 - 0x9F is not defined. Some web frameworks, or more likely the libraries they use to validate HTTP requests and strings, are strict about seeing an unspecified byte value (in this case 8C) and throw an error because of that :(

Is there any appetite to add a feature flag to encode string values set by the geoip module to prevent this? E.g. adding say geoip_string_uriencode on; to enable applying uriencoding to strings? (Any encoding that is ISO8859/latin1 "safe" would be ok..)