veracrypt / VeraCrypt

Disk encryption with strong security based on TrueCrypt
https://www.veracrypt.fr
Other
6.8k stars 940 forks source link

Command line password prompt fails when password includes french character. #540

Closed sunknudsen closed 4 years ago

sunknudsen commented 4 years ago

To reproduce the issue on macOS, create a volume using GUI and choose a password that includes a french character (é for example).

Then, try mounting it using the following.

veracrypt --text --keyfiles="" --mount --pim=0 --protect-hidden=no /Users/admin/test /Volumes/test

I confirm the issue is present on versions 1.23 and 1.24-Hotfix1.

Bug or config issue?

alt3r-3go commented 4 years ago

I can see that on Linux as well, using different non-English keyboard layout. The same password works ok when using GUI. Looking at the text mode commit history, it was adapted to work with UTF-8 symbols as well, I'll look into that more and report back if/when I find anything.

alt3r-3go commented 4 years ago

All right, I've found the reason and I believe I have a fix.

The reason is that our TextInputStream (which reads the password) doesn't decode the UTF-8 input properly even though the wxWidgets docs say that it will first try to find the BOM (which the password naturally doesn't have), then try UTF-8 and if that still doesn't work, fall back to using the "default encoding" (which is wxFONTENCODING_ISO8859_1, not IMHO what typical Linux console would have these days). That second step, that would have converted it properly, apparently doesn't happen/work per my tests (I have console encoding set to UTF-8 explicitly).

I haven't tried to instrument wxWidgets lib itself to see what exactly is happening there, because I believe it's less relevant and a proper way of action here for us - given that we are using this stream only for reading the user passwords and not e.g. files - is to explicitly set the fallback encoding for the TextIputStream's wxMBConv object to the user's system one (wxFONTENCODING_SYSTEM). After that everything starts to work just fine and should work even if user has something else than UTF-8. Below is the illustration of that. @idrassi, any objections for fixing it that way (see diff below), any other thoughts?

Before the fix, here's how the input stream treats the password with a UTF-8 multibyte character (I used a Polish keyboard layout and ę symbol instead of e in a password tęst:

[user@host Main]$ ./veracrypt --text --keyfiles="" --mount --pim=0 --protect-hidden=no ~/dev/vc/testunicode.hc ~/dev/vc/testmount
Enter password for /home/user/dev/vc/testunicode.hc: 
In TextUserInterface::ReadInputStreamLine, line=tÄst <<<<<<< this is my debug output, the password is apparently incorrect

Here's the proposed fix:

diff --git a/src/Main/TextUserInterface.cpp b/src/Main/TextUserInterface.cpp
index 69d45af4..afe2503d 100644
--- a/src/Main/TextUserInterface.cpp
+++ b/src/Main/TextUserInterface.cpp
@@ -41,6 +41,9 @@ namespace VeraCrypt
 #endif
                {
                        FInputStream.reset (new wxFFileInputStream (stdin));
+                       // Set fallback encoding of the stream reader to the user's system one
+                       // to make sure we interpret multibyte symbols properly
+                       wxConvAuto::SetFallbackEncoding(wxFONTENCODING_SYSTEM);
                        TextInputStream.reset (new wxTextInputStream (*FInputStream));
                }
        }

And here's the output of a fixed version - now it properly falls back to a system encoding (whatever it is) and decodes the string properly. I've checked the code paths where the TextInputStream is used and it's only for such passwords, so I don't expect the change to cause any adverse side effects.

[user@host Main]$ ./veracrypt --text --keyfiles="" --mount --pim=0 --protect-hidden=no ~/dev/vc/testunicode.hc ~/dev/vc/testmount
Enter password for /home/user/dev/vc/testunicode.hc: 
In TextUserInterface::ReadInputStreamLine, line=tęst <<<< the input is properly converted
idrassi commented 4 years ago

@alt3r-3go Thank you for the deep analysis and the fix proposal. It bothers me that wxWidgets doesn’t behave as documented. The modification you proposed modifies global behavior of wxWidgets which should be OK in our case but I would prefer to modify only the behavior of wxTextInputStream instance that we are using.

Can you please check if the fix still holds if you create an instance of wxConvAuto with parameter wxFONTENCODING_SYSTEM and then pass this wxConvAuto instance to the constructor of wxTextInputStream as the second parameter?

Also I will need to check how this behaves under macOS since this is where OP is encountering the issue and Terminal under macOS is known to have issues with international characters.

sunknudsen commented 4 years ago

Thanks so much for looking into this issue. Sounds promising!

alt3r-3go commented 4 years ago

@idrassi, yeah, it did look a bit strange to me too. Okay, for the sake of getting to the bottom of it, let me actually do a deeper dive and compile wxWidgets from source to see what's going on. I've just analyzed the library code yesterday and they do seem to invoke that UTF-8 based conversion (around here), but it must return an error, as they eventually go for a fallback eoncoding set by SetFallbackEncoding().

As for constructing the wxConvAuto instance with the specific encoding - yes, that does work as well and I've actually tried that yesterday, it's just that the approach I proposed looked a bit cleaner to me, as there's no need to specify that "separator" parameter for the wxConvAuto constructor explicitly, and we don't use this stream for anything else anyway. But I agree that localizing it could be a better strategy long-term, it usually is. Let me check the full flow and we'll discuss further.

This works as well:

diff --git a/src/Main/TextUserInterface.cpp b/src/Main/TextUserInterface.cpp
index 69d45af4..5750b07c 100644
--- a/src/Main/TextUserInterface.cpp
+++ b/src/Main/TextUserInterface.cpp
@@ -41,7 +41,7 @@ namespace VeraCrypt
 #endif
                {
                        FInputStream.reset (new wxFFileInputStream (stdin));
-                       TextInputStream.reset (new wxTextInputStream (*FInputStream));
+                       TextInputStream.reset (new wxTextInputStream (*FInputStream, wxT(" \t"), wxConvAuto(wxFONTENCODING_SYSTEM)));
                }
        }

Finally, yeah, it would be nice to check this in the original setting, as the fact that I observe something similar on Linux may be just a coincidence. I unfortunately don't have access to a Mac, so we'll have to rely on you :)

alt3r-3go commented 4 years ago

Ok, looks like they meant something else with that "we'll try UTF-8 before fallback" - they actually will, but they'll need a valid UTF-8 BOM there - as it looks like from getting through some commits referenced in the git blame for the relevant wxWidgets source file and some bugs I found in their Trac. It looks like they are still not able to properly decode UTF-8 without BOM when it's fed byte-by-byte (that's mentioned as an open problem, see references below), that would explain the behavior we observe (because the password we type of course doesn't contain the BOM). Let me try the code anyway, but this looks like a plausible hypothesis to me right now.

Relevant references:

  1. https://trac.wxwidgets.org/ticket/11570
  2. https://trac.wxwidgets.org/ticket/14720
  3. https://github.com/wxWidgets/wxWidgets/pull/1304
alt3r-3go commented 4 years ago

...a couple of experiments later...

Yes, my previous comment is correct. The below two simple experiments confirm it. When I inject a valid UTF-8 BOM before the password, wxConvAuto does decode it properly as UTF-8 without the fix I proposed earlier. When I remove it, it fails.

First, with the BOM:

[user@host Main]$ echo -e '\xEF\xBB\xBFtęst' |./veracrypt --text --keyfiles="" --mount --pim=0 --protect-hidden=no ~/dev/vc/testunicode.hc ~/dev/vc/testmount
Enter password for /home/user/dev/vc/testunicode.hc: 
In TextUserInterface::ReadInputStreamLine, line=tęst

Then without:

[user@host Main]$ echo -e 'tęst' |./veracrypt --text --keyfiles="" --mount --pim=0 --protect-hidden=no ~/dev/vc/testunicode.hc ~/dev/vc/testmount
Enter password for /home/user/dev/vc/testunicode.hc: 
In TextUserInterface::ReadInputStreamLine, line=tÄst

So indeed we need to set it explicitly and unless anyone has any objections, I think the way mentioned by @idrassi (by means of wxTextInputStream/wxConvAuto constructor parameter) is better long-term than the one I proposed initially (globally), so Mounir, please confirm if that fixes it on Mac and I'll prepare a PR. Alternatively, @sunknudsen, I can also prepare a debug binary for you to test this, if you're willing.

idrassi commented 4 years ago

@alt3r-3go I did tests under macOS and unfortunately the proposed fix (both your original one and after my proposal) doesn’t work when using wxFONTENCODING_SYSTEM. It works only if we specify wxFONTENCODING_UTF8.

I guess specifying wxFONTENCODING_UTF8 will also work on Linux but it will defeat the purpose of automatic detection of current system encoding and it may cause issues of terminal is configured with other encoding.

I’m not sure what is the best approach here. It would have been better if wxWidgets took care of all conversion details under the hood as it is supposed to do.

idrassi commented 4 years ago

@alt3r-3go after some thinking, using wxFONTENCODING_UTF8 seems to be the only correct choice since VeraCrypt UI only creates password using UTF-8 regardless of the system encoding configuration (the UI Text Entry gives us the password as UTF-16 and we explicitly convers it to UTF-8).

Can you check that it works this way also on Linux? Do you see any issues with my reasoning above? If things are OK for you, a PR with this would be helpful.

alt3r-3go commented 4 years ago

It's interesting that only a specific one works on MacOS. But yeah, I've checked a bit more and those wx code comments/bugs are a bit confusing (to me, anyway), as some say they are fixing that, some - that it's an open, etc.

Anyway, yes, I think as long as we specifically state in the docs that we use UTF-8 for all passwords and users should take care of setting their locales to that (or input passwords non-interactively using that encoding), I think we can use that approach. And [I guesstimate here] UTF-8 is probably by far a default setting.

I've checked and using wxFONTENCODING_UTF8 on Linux works fine as well. Let me prepare a PR then.

idrassi commented 4 years ago

@sunknudsen I have prepared a macOS installation package that contains this fix and I'm attaching a zip file that contains it alongside its PGP signature using official VeraCrypt key. Can you check on your side that it indeed fixes problem for you too? Thank you.

VeraCrypt_1.24-Hotfix2-Preview_macOSX.zip

sunknudsen commented 4 years ago

Great work guys. I confirm the hotfix works. When are you planning on making it available on https://www.veracrypt.fr/en/Downloads.html?