keepassxreboot / keepassxc

KeePassXC is a cross-platform community-driven port of the Windows application “Keepass Password Safe”.
https://keepassxc.org/
Other
21.27k stars 1.47k forks source link

when LANG="" file with "ö" in name cannot be opened #11214

Open ChristianS99 opened 2 months ago

ChristianS99 commented 2 months ago

Overview

When LANG is set to "" file "Passwörter.kdbx" cannot be opened

Steps to Reproduce

$ keepassxc-cli ls Passwörter.kdbx 
Passwort zum Entsperren von Passwörter.kdbx eingeben:
<...>
$ LANG= keepassxc-cli ls Passwörter.kdbx                                                           
Failed to open database file Passw?rter.kdbx: not found

Expected Behavior

file can be opened

Context

Not 100% sure, but to my experience, LANG should only affect the used output lanugage, and should not affect the encoding that is used for interpreting given arguments, eg.:

$ cat föo                                                                                                                                                                                                               
dasd                                                                                                                                                                                                                                        
$ LANG= cat föo                                                                                                                                                                                                         
dasd
$ LANG= keepassxc-cli --debug-info
KeePassXC - Version 2.7.9
Revision: 8f6dd13

Qt 5.15.14
Debugging mode is disabled.

Operating system: Gentoo Linux
CPU architecture: x86_64
Kernel: linux 6.10.3-gentoo-dist

Enabled extensions:
- Browser Integration
- Passkeys
- SSH Agent
- KeeShare
- YubiKey
- Secret Service Integration

Cryptographic libraries:
- Botan 3.2.0

GUI is affected in same way

droidmonkey commented 2 months ago

This is definitely a qt issue. There isn't much we can do about this. Obviously easy fix is to not use non-ascii characters in your file name or make sure your LANG is set.

phoerious commented 2 months ago

Turns out, this is a Qt issue, but the problem is in fact a bit more complex.

"Passw?rter.kdbx" means a Latin1 string is being interpreted as UTF-8 here. However, simply running

LANG= keepassxc-cli ls Passwörter.kdbx

isn't a good problem demonstration. keepassxc-cli will use a default locale, but the parameters are in whatever your terminal's own input encoding is, so this could be anything.

The following Python snippet is a more stable test case:

import subprocess
subprocess.Popen([b'keepassxc-cli', b'ls', 'Passwörter.kdbx'.encode('iso-8859-1')],
                  env={'LANG': ''}).wait()

This is where we parse the command line arguments: https://github.com/keepassxreboot/keepassxc/blob/develop/src/cli/keepassxc-cli.cpp#L194

I believe that

    for (int i = 0; i < argc; ++i) {
        arguments << QString(argv[i]);
    }

is indeed wrong. This should at least be QString::fromLocal8Bit(argv[i]), but according to the docs, this is equivalent to QString::fromUtf8() on Linux, which is obviously wrong when your system locale isn't Unicode-based.

I think defaulting to UTF-8 is a sane assumption for most Linux systems, but if your input encoding is something else, this will obviously fail. To fix this, we'd need to parse LANG or LC_ALL ourselves, but even with those variables set, we could only guess what the actual input encoding of the command line parameters is.

ChristianS99 commented 2 months ago

Turns out, this is a Qt issue, but the problem is in fact a bit more complex.

"Passw?rter.kdbx" means a Latin1 string is being interpreted as UTF-8 here. However, simply running

yeah, agree. looks, like output is latin1, where the terminal expects utf8

LANG= keepassxc-cli ls Passwörter.kdbx

isn't a good problem demonstration. keepassxc-cli will use a default locale, but the parameters are in whatever your terminal's own input encoding is, so this could be anything.

terminal's input encoding is utf8, even with LANG="" as setting this in front of command only change it for the command, not the terminal

The following Python snippet is a more stable test case:

import subprocess
subprocess.Popen([b'keepassxc-cli', b'ls', 'Passwörter.kdbx'.encode('iso-8859-1')],
                  env={'LANG': ''}).wait()

This is where we parse the command line arguments: https://github.com/keepassxreboot/keepassxc/blob/develop/src/cli/keepassxc-cli.cpp#L194

I believe that

    for (int i = 0; i < argc; ++i) {
        arguments << QString(argv[i]);
    }

is indeed wrong. This should at least be QString::fromLocal8Bit(argv[i]), but according to the docs, this is equivalent to QString::fromUtf8() on Linux, which is obviously wrong when your system locale isn't Unicode-based.

mind, this is the qt6 docs, qt5 is different, and actually fromLocal8Bit and fromUtf8 do different things.

I think defaulting to UTF-8 is a sane assumption for most Linux systems, but if your input encoding is something else, this will obviously fail. To fix this, we'd need to parse LANG or LC_ALL ourselves, but even with those variables set, we could only guess what the actual input encoding of the command line parameters is.

#include <QString>
#include <QTextStream>
#include <iostream>

int main(int argc, char *argv[])
{
    if (argc > 1) {
        QTextStream out(stdout);
        std::cout << argv[1] << std::endl;
        int p = 0;
        while (argv[1][p] != 0) {
            printf("%x ", (unsigned char)argv[1][p]);
            p+=1;
        }
        printf("\n");
        QString s1 = QString(argv[1]);
        out << s1 << Qt::endl;
        QString s2 = QString::fromLocal8Bit(argv[1]);
        out << s2 << Qt::endl;
        QString s3 = QString::fromUtf8(argv[1]);
        out << s3 << Qt::endl;
    }
}

small test program to try a few things. running this program with LANG= ./qttest aäböc gives this output:

aäböc
61 c3 a4 62 c3 b6 63 
a?b?c
a??b??c
a?b?c

line 1: terminal is consistent with encoding of input given and output expected line 2: the encoding actualy is utf8 line3: Qstring obviously converts the byte sequence somehow. line 4 and 5: fromLocal8Bit and fromUtf8 are different (on qt5)

phoerious commented 2 months ago

yeah, agree. looks, like output is latin1, where the terminal expects utf8

This is not just the terminal output, but first and foremost the file name. File names are always UTF-8 on Linux, so using a Latin1 string is wrong in any case.

When I QDebug my QLocale, it always says "Latin1", even when it's actually UTF-8. I also couldn't find any difference in behaviour between QString(argv[i]) and QString::fromLocal8Bit(argv[i]). However, looking at the Qt source code for QCommandlineParser::process(&QCoreApplication), I figure that QString::fromLocal8Bit(argv[i]) is indeed the correct way.

AugustoMagalhaes commented 2 months ago

Christian, try to execute the executable like this

LANG=de_DE.UTF-8 executable args

Tell me if it works, I had the same problem in Qt and fixed it by using LANG=C in another situation. Good luck mate

ChristianS99 commented 1 month ago

LANG=de_DE.UTF-8 executable args

This works, and it is my default. It just stumbled over the problem by accident, and thought I ccould report it.