psychoone7 / pyftpdlib

Automatically exported from code.google.com/p/pyftpdlib
Other
0 stars 0 forks source link

Transparent filename encoding transform? #257

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hi there.

I've been working with your library for quite a while, and it was just so 
simple yet worked like a charm.

I'm here to suggest some enhancements. There is a common problem to deal with 
FTP servers that is about encoding. While the client file name encoding is 
different from server file name encoding, we would have a lot of problems 
dealing with that. This is quite common here for Windows' default Chinese file 
name encoding is GBK yet Linux's is UTF-8. So I wish to add some transparent 
encoding transform function in my ftp server. And I think that it would need 
some implementation in the library source...

Also, if this function can simply be implemented by subclassing FTPHandler or 
something, please let me know.

Cheers.

Original issue reported on code.google.com by huangkan...@gmail.com on 1 May 2013 at 3:05

GoogleCodeExporter commented 9 years ago
Hi there.

I solved this problem by inspecting the code and subclass FTPHandler myself.

I'll paste the code for anyone who needs it.

#code starts
from asynchat import async_chat

class EncodedProducer:
    def __init__(self, producer):
        self.producer = producer
    def more(self):
        return self.producer.more().decode("utf8").encode(encoding)

class EncodedHandler(FTPHandler):

    def push(self, s):
        async_chat.push(self, s.encode(encoding))

    def push_dtp_data(self, data, isproducer=False, file=None, cmd=None):
        if file==None:
            if isproducer:
                data=EncodedProducer(data)
            else:
                data=data.decode("utf8").encode(encoding)

        FTPHandler.push_dtp_data(self, data, isproducer, file, cmd)

    def decode(self, bytes):
        return bytes.decode(encoding, self.unicode_errors)
#code ends

encoding stands for the target encoding you wish to transform to.

Using EncodedHandler instead of FTPHandler would help solve this problem

Original comment by huangkan...@gmail.com on 1 May 2013 at 6:08

GoogleCodeExporter commented 9 years ago
The problem here is that (AFAIK) the FTP protocol has absolutely *no* means of 
specifying or querying which character-encoding is in use - you just have to 
'hope' that the client and server are using the same encoding :-(
RFC2640 ( https://tools.ietf.org/html/rfc2640 ) specifies that the character 
encoding SHOULD be UTF-8, and pyftpdlib is now Unicode / RFC2640 compliant.
http://code.google.com/p/pyftpdlib/issues/list?can=1&q=unicode

So I guess your code serves as an example of how you could _force_ a different 
encoding if you can't use UTF-8, but IMHO it shouldn't be built into the 
library... of course Giampaolo may disagree ;-)

Original comment by gc...@loowis.durge.org on 1 May 2013 at 11:54

GoogleCodeExporter commented 9 years ago
Yes Andrew is right. I didn't make the server encoding configurable exactly for 
this reason: as per RFC guideline client and server have no way to agree on a 
specific encoding, therefore I thought it was better to just stick with UTF-8 
as dictated by RFC and be done with it.

If on one hand this is "the right thing to do", on the other hand perhaps there 
are cases where changing the default server encoding in order to support 
misbehaving clients might be desirable (note: at the cost of 'breaking' 
compliant ones). If this is the case I'd like to hear more about the scenario 
the OP is facing (in detail the FTP client used and what happens by using 
UTF-8).

That said, the code shown above changes the encoding of the control connection 
(and that might be "right") but also applies an encoding for the data exchanged 
through the data connection, and that is something which should be done only 
for the listing commands (LIST, MLSD, etc), not when transmitting files. 
What you want to do instead is override AbstractedFS's format_list() and 
format_mlsx() methods and leave FTPHandler.push_dtp_data alone:

class CustomFS(AbstractedFS):

    def format_list(self, *args, **kwargs):
        generator = AbstractedFS.format_mlst(self, *args, **kwargs)
        for item in generator:
             yield item.decode("utf8").encode(YOUR_ENCODING)

     # same for format_mlsx()

If we decide to make server encoding configurable we can avoid to go through 
all these troubles, but I'd like to hear OP's scenario first in order to figure 
out if it's actually worth the effort.

Original comment by g.rodola on 1 May 2013 at 11:31

GoogleCodeExporter commented 9 years ago
Just a quick note - Giampaolo's defintely right that you don't want to mess 
about with the 'encoding' for the actual file data (would give corrupted 
files), but presumably _if_ the encoding of filenames for the LIST and MLSD 
commands is being altered, then the encoding of the filenames for the 
STOR/RETR/DELE/etc. commands would need to be altered too?

Original comment by gc...@loowis.durge.org on 2 May 2013 at 12:29

GoogleCodeExporter commented 9 years ago
Yes, but apparently he did that already by overriding FTPHandler.decode().

Original comment by g.rodola on 2 May 2013 at 12:38

GoogleCodeExporter commented 9 years ago
Hi there.

Thanks for all your responses.

Well, I understood that these "functions" would not be included in the library 
source cause it is not really a function that should be considered...

But let's talk about the codes I pasted, hmmmmm.... I know it may sound like a 
dirty hack, but shouldn't push_dtp_data always be called with a non-None file 
argument if it is transmitting files? So would it be nice to distinguish list 
commands from file data by checking the file argument in push_dtp_data? Does 
this method have any kind of limitations ?

Cheers.

Original comment by huangkan...@gmail.com on 2 May 2013 at 1:05

GoogleCodeExporter commented 9 years ago
push_dtp_data() is called with a "producer" argument also for listing commands, 
not only for files-related ones. That aside, a 'cmd' argument is also passed, 
so you might want to inspect that.

Original comment by g.rodola on 2 May 2013 at 1:09