The geo module takes Maxmind's data base values, UTF-8 strings, and inserts them as header(s) as configured. That works well in nginx but it can cause upstream issues: some web frameworks are strict about string encoding and do not like UTF-8 in an http header value.
An example: the value for Japanese city Ōbu gets encoded as string header value "C58C6275". In ISO8859 / Latin1 range 0x80 - 0x9F is not defined. Some web frameworks, or more likely the libraries they use to validate HTTP requests and strings, are strict about seeing an unspecified byte value (in this case 8C) and throw an error because of that :(
Is there any appetite to add a feature flag to encode string values set by the geoip module to prevent this? E.g. adding say geoip_string_uriencode on; to enable applying uriencoding to strings? (Any encoding that is ISO8859/latin1 "safe" would be ok..)
We're using the geoip2 module to do ip lookups from a Maxmind's database and insert this in requests going upstream.
Maxmind recently started introducing more non-ASCII contents in their dataset. See https://dev.maxmind.com/geoip/release-notes/2023#more-non-ascii-characters-in-english-place-names-in-geoip-products-and-services
The geo module takes Maxmind's data base values, UTF-8 strings, and inserts them as header(s) as configured. That works well in nginx but it can cause upstream issues: some web frameworks are strict about string encoding and do not like UTF-8 in an http header value.
An example: the value for Japanese city
Ōbu
gets encoded as string header value "C5
8C
62
75
". In ISO8859 / Latin1 range0x80
-0x9F
is not defined. Some web frameworks, or more likely the libraries they use to validate HTTP requests and strings, are strict about seeing an unspecified byte value (in this case8C
) and throw an error because of that :(Is there any appetite to add a feature flag to encode string values set by the geoip module to prevent this? E.g. adding say
geoip_string_uriencode on;
to enable applying uriencoding to strings? (Any encoding that is ISO8859/latin1 "safe" would be ok..)