Open danijelt opened 4 years ago
Hi @danijelt
Thanks for reporting the issue. Which version of Identity Server are you observing this issue?
I use IS 5.9.0.
Hi @danijelt
Tried reproducing the observed behaviour as follows
curl --location --request POST 'https://localhost:9443/scim2/Users' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic YWRtaW46YWRtaW4=' \
--data-raw '{
"schemas": [],
"name": {
"familyName": "jackson",
"givenName": "kim"
},
"userName": "ŠĐČĆŽšđčćž",
"password": "kimwso2",
"emails": [
{
"primary": true,
"value": "kim.jackson@gmail.com",
"type": "home"
}
]
}'
{
"at_hash": "o8wWQ5rlOYJNUBjGfGgRZw",
"aud": "6vnZ_1rcYYpnG4D7y7Pg6iBf11Ma",
"c_hash": "D8q3tJIGNJvaaSZ6KxqsxA",
"sub": "ŠĐČĆŽšđčćž",
"nbf": 1579186925,
"azp": "6vnZ_1rcYYpnG4D7y7Pg6iBf11Ma",
"amr": [
"BasicAuthenticator"
],
"iss": "https://localhost:9443/oauth2/token",
"exp": 1579190525,
"iat": 1579186925
}
I used the default LDAP user store for testing. Can you give us more details to see if there could an issue?
I use JDBC user store. OS is CentOS 7, database is MariaDB 10.3. It's configured for UTF-8 as default, charset in the DB is UTF-8 and collation is utf8_croatian_ci. UTF-8 is also defined in the JDBC connection string and I've set -Dfile.encoding=UTF-8
in wso2server.sh.
"ŠĐČĆŽšđčćž" is properly stored in the database. Only the representation is malformed when retrieved through WSO2, either through the Carbon admin panel, or SCIM2 API.
I tested on both Chrome and Firefox, Linux and Windows, and SCIM2 through Postman and curl.
Can you confirm that
<Valve className="org.wso2.carbon.tomcat.ext.valves.RequestEncodingValve" encoding="UTF-8"/>
is present in the repository/conf/tomcat/catalina-server.xml file
and
all data sources in repository/conf/datasources/master-datasources.xml file have the 'characterEncoding=UTF-8' query param added.
eg: jdbc:mysql://localhost:3306/user_db?characterEncoding=UTF-8
I use deployment.toml file so the changes are overwritten on restart. However, if I remove deployment.toml and add characterEncoding, it doesn't change anything because the users are in the secondary user store added through GUI, and it already has the following added to the connection string: ?autoReconnect=true&connectTimeout=5&socketTimeout=5&characterEncoding=UTF-8&useUnicode=true
.
Also, RequestEncodingValve is already present in catalina-server.xml file.
I resolved it. The problem is that MySQL driver (both 5.x and 8.x from Oracle) doesn't handle "utf8_croatian_ci" collation correctly. utf8_unicode_ci works fine.
I suggest that you document this for other people.
Moving to doc-is to document information related to UTF-8 encoding
My application creates users with UTF-8 encoding via SCIM2 API. The users are properly sent, received and correctly stored to UTF-8 MySQL database. When I try to get claims via OIDC, non-ASCII characters are returned incorrectly. Example:
ŠĐČĆŽšđčćž
(UTF-8)ŠĐČĆŽšđčćž
(UTF-8)à  Ã�ÃÅÆà ½à ¡ÑÃ�Çà ¾
(???)