ucphhpc / migrid-sync

MiGrid workspace where master branch is kept strictly in sync with SF upstream svn repo. Any development or experiments should use a branch. You probably want to fork your own clone or work e.g. on the edge branch if you wish to contribute.
GNU General Public License v2.0
3 stars 3 forks source link

User names containing special characters are not allowed to use the system #35

Open Bjarke42 opened 3 months ago

Bjarke42 commented 3 months ago

We have a user with the name O'Dwyer and because of apostrophe sign (0x27 in iso8859-1), then the user cannot login. This is because migrid does not allow apostrophe in names.

We need real user names to be in the system so that regular users and administrators know who the person is, both when it comes to authentication and security.

We can agree that persons should not be allowed to choose anything, but name the comes from CPR in Denmark are legal names, and should all be supported. More info here on legal chacters in CPR.

jonasbardino commented 3 months ago

Thanks for the input. Unfortunately we cannot simply allow various characters like hyphen without introducing potential bugs or security issues at the moment. We'll have to wait for the on-going User ID restructuring in milestone 3 to settle first.

Bjarke42 commented 1 month ago

You also need to support full characters for iceland and faro island.

jonasbardino commented 1 month ago

We already supported these common accented chars https://practicaltypography.com/common-accented-characters.html and added those remaining Northern Atlantic ones explicitly for name fields.

Bjarke42 commented 1 month ago

Yes but that list you just gave does not contain all characters that are in the icelandic and faroese character set. You need them all. A customer has characters that are not included in the list you gave.

jonasbardino commented 1 month ago

Please have a look at my commit earlier today e.g. in https://github.com/ucphhpc/migrid-sync/commit/c7d5528676f0f4b9fe13326c7e669577858d51a0 and let me know if I missed any from the alphabets you linked to.

Bjarke42 commented 1 month ago

Signup with new included characters does not work, here are som errors when trying to sign-up:

f1

Retrying again a second time:

f2

Trying on experimental verision: c7d5528676f0f4b9fe13326c7e669577858d51a0

jonasbardino commented 1 month ago

Right, thanks for testing. While it is a valid bug report that's not actually the accented characters or the particular user ID triggering input validation errors. The 1st one should indeed have been accepted. I have a hunch it's a regression from the recent WAYF integration work. Their sub claim contains an at-sign (@) but that should of course not change the validation into always requiring an email-like value. We'll take another look at the sub validator.

The 2nd one looks like a broken request as openid.(sreg|ns) (openid 2.0) end oidc.claim (openid connect) args should not be mixed. Can you explain or remind me what the retry consisted of - e.g. did it simply re-post the page or actually involve e.g. another sign up button click and/or a repeat sign in with your ID provider?

Bjarke42 commented 1 month ago

The last retry picture looks excatly the error you get when you first login as an non existing user on the system, then afterwards push sign-up, you will get a similar error. The problem described in issue 3

jonasbardino commented 1 month ago

okay, but as I also commented on that issue it looks like a broken request once it reaches apache. I'm trying to understand exactly how those sreg value enter the equation. If it's a direct result of clicking the 'Go back to try again' link itself or if it takes you back to your public site front page and the sign up button (i.e. the html form) there then sends those sreg values along.

In any case I've changed the sub validation to allow both ascii string and email format again, so hopefully the sign up should now succeed. Please let me know when you have had time to test it.

jonasbardino commented 1 month ago

For the record the fix for the oidc.claim.sub validation is in https://github.com/ucphhpc/migrid-sync/commit/1ea553b48091051a1e7074f3c346a4a0386b5537

Bjarke42 commented 1 month ago

1ea553b48091051a1e7074f3c346a4a0386b5537 has been tested now, and its working with special characters.

jonasbardino commented 1 month ago

Great, thanks!

We'll keep this issue open for now in order to not forget the original apostrophe problem.

One detail I noticed in relation to the most recent WAYF-integration was that we actually pre-filter the full name field when users sign up with similar external OpenID 2.0 ID providers through autocreate.py. In effect this pre-filter simply strips any unsupported characters from the name so that O'Dwyer would become ODwyer there. I suppose we should at least consistently implement the same handling for the OpenID Connect sign up. Maybe expose exactly what to do in general as a configuration option: fail, replace or strip such unsupported characters. While it's not perfect to throw away or replace unsupported characters at least it would let all users sign up and rarely show anywhere. Input to that is of course welcome.

Bjarke42 commented 1 month ago

I think it could be used as a workaround for sometime. But there are several special chacters that could be problematic, because it could result in two people becomming the same user account?

If the account is always unique, no matter the special characters in the name, then it could be used, but what about when its fixed, then there need to be a way to update all the users with missing characters.

I will have to ask if this can be accepted as a real workaround.

jonasbardino commented 1 month ago

You still have e.g. email as part of the user ID, too, and another user signing up with same email will always be rejected. So dropping chars in full name is sub-optimal because it's a one-way operation, but at least doesn't add collisions.

I've talked to a colleague about adding a proper translation instead of just dropping unsupported chars. We'll look at that as a better solution and consider exposing it as the 3rd configuration option.

In practice e.g. translating ' to some variation of the corresponding utf8 codepoint U0027 should work and be fully reversible.

jonasbardino commented 4 weeks ago

We've reworked the handling of unsupported characters in sign up to be consistent for all auth types and exposed the option to either skip or hex-encode any such characters. This basically adds a workaround/fix for the original issue. However, while at it we will investigate better reversible encoding formats. The simple hex encoding acts differently on python2 and 3 due to the inherent unicode vs byte string difference and thus might make migration more cumbersome.

Feel free to test in the meantime with the new auto_add_filter_method and auto_add_filter_fields variables also exposed in the docker-migrid version.