kazurayam / MockFtpServer

Apache License 2.0
0 stars 0 forks source link

MockFtpServer should support UTF-8 #5

Open kazurayam opened 1 week ago

kazurayam commented 1 week ago

Using MockFtpServer, I want to simulate a FTP Server that encodes file names in UTF-8.

The v3.2.0 MockFtpServer does not support encoding file names (in UNICODE) into UTF-8.

So I want to change MockFtpServer so that it encodes file names as Java String in UNICODE into UTF-8.

As for the RFC, see the following sources

https://wiki.filezilla-project.org/Character_Encoding

https://filezilla-project.org/specs/rfc2640.txt

kazurayam commented 1 week ago

v3.2.0, org.mockftpserver.fake.command.ListCommandHandler has the following code fragment:

        String result = StringUtil.join(lines, endOfLine());
        result += result.length() > 0 ? endOfLine() : "";

        sendReply(session, ReplyCodes.TRANSFER_DATA_INITIAL_OK);

        session.openDataConnection();
        LOG.info("Sending [" + result + "]");
        session.sendData(result.getBytes(), result.length());

The result variable is a reply to LIST command, which is a java.lang.String instance, is something like

-rwxrwxrwx  1 none     none                  51 Nov 19 21:59 a日本語で遊ぼう.txt
-rwxrwxrwx  1 none     none                  16 Nov 19 21:59 foobar.txt

When the result variable contains NON-Latin1 characters like "日本語で遊ぼう", the call result.getBytes() will create an byte array derived from UNICODE. This means, the byte array derived from UNICODE will be replied from the FTP Server to FTP Client.

Now, FTP Client get a byte array, it has to convert the byte array back to a Java String. --- This is not what we usually do.

We usually & naively assume is that the byte array is an representation of a String encoded with UTF-8.

So the FTP Client tries to decode the byte array, which was originally a straight UNICODE. Then we will see garbled characters.


I want FakeFtpServer to optionally allow encoding file names String in UNICODE into a byte array using UTF-8. This means, I want to change

        session.sendData(result.getBytes(), result.length());

to

        if (allowUTF8 == true) {
            byte[] ba = result.getBytes(StandardCharset.UTF_8);
            session.sendData(ba, ba.length)
        }