Closed gazpachoking closed 9 years ago
Downstream bug report, for reference. http://flexget.com/ticket/2704
I don't think there's anything path.py can do to address this issue. It's quite possible the issue is a limitation of the warnings module.
At the very least, we'll need an example how to recreate the error.
It happens whenever path.py tries to warn about a file/directory that is incorrectly encoded, which essentially turns 'warn' mode into 'strict' mode, albeit with a more obtuse error. I'll do some experiments and see if I can dig deeper into exactly what the issue is.
So, the issue is that the warning message path.py creates is a unicode
object which includes the invalid directory name. When warnings tries to create a message (which on python 2 is a str
) it tries to decode the unicode warning message, which is of course not possible with the ascii codec.
I'm thinking to fix this on python 2, rather than using child
in the warning message, we can use child.encode('unicode_escape')
or something of that nature to give the warnings module the bytes it desires. This of course won't be that useful to the user, since the name was encoded wrong and we didn't know how to interpret it, but it at least won't crash.
Or maybe there is a way to round trip bytes->unicode->bytes properly on python 2 with the replacement surrogateescape. I suspect the answer is in here somewhere. https://github.com/PythonCharmers/python-future/blob/master/future/utils/surrogateescape.py
I think the nicest solution would be if we could have a (python 2) __str__
method, which encoded using the filesystem encoding, and supported turning the surrogates into the proper source bytes. That way warnings module, and any modules dealing with the actual path (like here) would work.
Hmm, now that I think about it, maybe we wouldn't even need to warn anymore, and it would just handle the invalid encoded paths.
I did some testing with the encodefilename
and decodefilename
from future.utils.surrogateescape
on python 2 and it seemed to work well (tested with utf-8 paths and filesystemencoding forced to ascii.). It would be super cool if path.py transparently encoded the filename back to bytes when calling the system functions whenever on python 2 and os.path.supports_unicode_filenames
was false. If there is interest in this I can give a go at a pull request.
What do you think of the latest patch, which allows the caller of path.walk
to supply an arbitrary callable for the error handler? In that case, create an error handler that takes a unicode 'msg' and encode it suitably before passing it to warnings.warn.
Here is my quick test to make paths with invalid encodings still work on python 2. I haven't given it too much testing, but at first glance it seems to work. I imagine you don't want to add any deps, so the stuff from future
could be pulled in directly. Also, it would probably be nice to make _native_fs_string
something public, probably with a better name (I suck at naming.)
https://github.com/gazpachoking/path.py/compare/py2_invalid_encoding
What do you think of the latest patch, which allows the caller of path.walk to supply an arbitrary callable for the error handler? In that case, create an error handler that takes a unicode 'msg' and encode it suitably before passing it to warnings.warn.
Oh, didn't see this till I finished up my test branch. That would work fine for me too, in that case I'd probably just use logging
in a custom error handler rather than warnings
. I think it would be cooler though if it just supported invalid encodings like python 3 does (it does doesn't it? haven't tested, but I imagine surrogateescape handler can go back to original bytstring when encoded there), which is what my branch is doing.
I'll do some testing on python 3 to see the behavior there, but it seems to me that if the 'warn' mode stays in, it's still a bug that it causes an exception on python 2, even if there is an alternative to handle the warning yourself with a custom handler.
I haven't looked in to how to implement, but it would also be cool also to have some tests run with LC_ALL
environment variable set to something like en_US.ascii
to test that any warning/exceptions are done properly for invalid encoded paths.
Yeah, it seems like on python 3 invalid encoded paths are handled, both with os.path
functions and with path.py. It'd be nice to bring python 2 into parity when using path.py. The 'warn' mode (and other error modes) of walk would then become superfluous.
I separated out the idea of improving invalid filename support on python 2 to #73, since that is really a separate idea than this bug.
Just to clarify what the bug is here, on python 2, warnings.warn
expects a byte string for the message.
I think we're looking at a wontfix
resolution on this, but I'm happy to consider another fix if appropriate.
Seems there can be an issue creating the warning text when using the
walk
method in warn mode. I suspect this might only happen on python 2.