grnet / djnro

DjNRO hits the decks of eduroam database management
http://djnro.grnet.gr/
Other
10 stars 21 forks source link

Fix attribute encoding when using Shibboleth #102

Open ghalse opened 2 months ago

ghalse commented 2 months ago

RFC2616 states that HTTP headers are encoded in latin1 (iso-8859-1), and the Python/Django request.META (correctly) assumes that incoming headers will be encoded in this way.

However, by default, Shibboleth ignores the iso-8859-1 restriction and puts the UTF-8 encoded values from SAML into its request headers with ShibUseHeaders without transliteration ref]. This results in incorrectly encoded characters when non-ASCII / accented characters are used in e.g. the first or last name.

There are two ways we could fix this. The approach used here is to simply acknowledge the incorrect encoding and fix it (i.e. force the string to be interpreted as UTF-8 rather than Latin1. This is backwards compatible and will be invisible to any sites that don't already have incorrectly encoded names.

The alternative would be to make use of Shibboleth's ShibRequestSetting encoding URL option in the Apache config to force Shibboleth to URL encode the string. We would then have to decode it when we consumed it. This approach is arguably more correct since the headers would be RFC compliant, but involves much more work and requires users change their webserver config. It's not backwards compatible.