OIDC returns claims with wrong encoding

danijelt commented 4 years ago

My application creates users with UTF-8 encoding via SCIM2 API. The users are properly sent, received and correctly stored to UTF-8 MySQL database. When I try to get claims via OIDC, non-ASCII characters are returned incorrectly. Example:

Sent via SCIM2: ŠĐČĆŽšđčćž (UTF-8)
Stored in database: ŠĐČĆŽšđčćž (UTF-8)
Returned via OIDC (in JWT token): ÃÂ Ãï¿½ÃÅÃ†ÃÂ½ÃÂ¡Ã‘Ãï¿½Ã‡ÃÂ¾ (???)

mefarazath commented 4 years ago

Hi @danijelt

Thanks for reporting the issue. Which version of Identity Server are you observing this issue?

danijelt commented 4 years ago

I use IS 5.9.0.

mefarazath commented 4 years ago

Hi @danijelt

Tried reproducing the observed behaviour as follows

Create a user via SCIM2 whose username is ŠĐČĆŽšđčćž

curl --location --request POST 'https://localhost:9443/scim2/Users' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic YWRtaW46YWRtaW4=' \
--data-raw '{
    "schemas": [],
    "name": {
        "familyName": "jackson",
        "givenName": "kim"
    },
    "userName": "ŠĐČĆŽšđčćž",
    "password": "kimwso2",
    "emails": [
        {
            "primary": true,
            "value": "kim.jackson@gmail.com",
            "type": "home"
        }
    ]
}'

Obtain an id_token using authorization grant. id_token contained the username in sub claim in the expected format (Decoded id_token is shown below)

{
"at_hash": "o8wWQ5rlOYJNUBjGfGgRZw",
"aud": "6vnZ_1rcYYpnG4D7y7Pg6iBf11Ma",
"c_hash": "D8q3tJIGNJvaaSZ6KxqsxA",
"sub": "ŠĐČĆŽšđčćž",
"nbf": 1579186925,
"azp": "6vnZ_1rcYYpnG4D7y7Pg6iBf11Ma",
"amr": [
"BasicAuthenticator"
],
"iss": "https://localhost:9443/oauth2/token",
"exp": 1579190525,
"iat": 1579186925
}

I used the default LDAP user store for testing. Can you give us more details to see if there could an issue?

danijelt commented 4 years ago

I use JDBC user store. OS is CentOS 7, database is MariaDB 10.3. It's configured for UTF-8 as default, charset in the DB is UTF-8 and collation is utf8_croatian_ci. UTF-8 is also defined in the JDBC connection string and I've set -Dfile.encoding=UTF-8 in wso2server.sh.

"ŠĐČĆŽšđčćž" is properly stored in the database. Only the representation is malformed when retrieved through WSO2, either through the Carbon admin panel, or SCIM2 API.

I tested on both Chrome and Firefox, Linux and Windows, and SCIM2 through Postman and curl.

mefarazath commented 4 years ago

Can you confirm that

<Valve className="org.wso2.carbon.tomcat.ext.valves.RequestEncodingValve" encoding="UTF-8"/>

is present in the repository/conf/tomcat/catalina-server.xml file

and

all data sources in repository/conf/datasources/master-datasources.xml file have the 'characterEncoding=UTF-8' query param added.

eg: jdbc:mysql://localhost:3306/user_db?characterEncoding=UTF-8

danijelt commented 4 years ago

I use deployment.toml file so the changes are overwritten on restart. However, if I remove deployment.toml and add characterEncoding, it doesn't change anything because the users are in the secondary user store added through GUI, and it already has the following added to the connection string: ?autoReconnect=true&connectTimeout=5&socketTimeout=5&characterEncoding=UTF-8&useUnicode=true.

Also, RequestEncodingValve is already present in catalina-server.xml file.

danijelt commented 4 years ago

I resolved it. The problem is that MySQL driver (both 5.x and 8.x from Oracle) doesn't handle "utf8_croatian_ci" collation correctly. utf8_unicode_ci works fine.

I suggest that you document this for other people.

mefarazath commented 4 years ago

Moving to doc-is to document information related to UTF-8 encoding

wso2 / docs-is

OIDC returns claims with wrong encoding #1075