sillsdev / chorus

End-user collaboration library via 3-way xml merging and hg dvcs under the hood
6 stars 26 forks source link

Password encoding is wrong in ServerSettingsModel.LogIn #324

Open rmunn opened 1 year ago

rmunn commented 1 year ago

The LogIn() method in ServerSettingsModel currently contains the following code:

https://github.com/sillsdev/chorus/blob/271b640d6ba47b0b6d6dc2bed049d10934f7ee02/src/LibChorus/Model/ServerSettingsModel.cs#L338-L342

Notice how it's declaring the content type as "application/x-www-form-urlencoded" but not actually URL-encoding the password. This violates the spec which says:

Forms submitted with this content type must be encoded as follows: Control names and values are escaped. Space characters are replaced by +, and then reserved characters are escaped as described in RFC1738, section 2.2: Non-alphanumeric characters are replaced by %HH, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., %0D%0A).

This is the root cause of issues such as https://github.com/sillsdev/languageforge-lexbox/issues/255, because it is easy to generate passwords with a password generator that contain characters such as & and + (that's not an exhaustive list), which will end up being sent incorrectly by Chorus and then rejected with a password error on the other end (when "a+b" gets URL-decoded into "a b", for example).

The long-term solution to this bug is to fix how Chorus sends passwords via HTTP POST, by URL-encoding them before writing them into the POST request body. The short-term solution will have to be telling users not to put special characters into their passwords. Some special characters make it through unscathed, due to not having any special meaning when they are URL-decoded, but it's far easier to explain "no punctuation in your password" in a choose-your-password UI than to explain "the & character, and these other characters, will fail because we messed up our code".

rmunn commented 11 months ago

I've done pretty extensive testing, and the characters that cause problems are + (always), & (always), and % (sometimes). A % at the end of a password works, but a % that just happens to be followed by two hex digits (0-9 or a-f) will end up decoded as if it was a character code and end up not letting the user log in (because the password that's being verified isn't actually the password they typed anymore).

Every other punctuation that can be typed on a US keyboard works. I.e., the following characters do not cause trouble:

!@#$^*()-_=/"'?<>;:\|[]{}`~

The only characters that will cause trouble are &, %, and +.