Open uranusjr opened 6 years ago
I think this makes a lot of sense, does anything break if we implement it right now?
Yes, if you already have a platform-specific encoded Pipfile with non-ASCII stuff in it. But this is easily amendable by re-saving the Pipfile with UTF-8.
How can we gracefully warn people about that when we encounter it?
The parsing would almost always fail with UnicodeDecodeError
if the encoding is not UTF-8, so it would probably be enough to catch that and emit a prettier error.
I doubt that would be necessary though. This would happen mainly on 1. Windows 2. POSIX systems with non-UTF-8 encoding. The latter is extremely unconventional these days (and I believe is not supported anyway). The former is probably next to non-existent, based on how many fundamental issues brettcannon (avoiding an @ here) was able to file when he adopted Pipenv through VS Code.
Can we fall back to their default system encoding if we handle that exception?
python 2 may be a bit hairy for this 😭
@jtratner Yeah it needs careful testing, but it can be pulled off. Many projects already did.
@techalchemy It should be. I’m just not sure about the necessity…
@uranusjr @jtratner relevant link from Victor Stinner from yesterday regarding 3.7 which will by default open things in utf-8 mode: https://vstinner.github.io/python37-new-utf8-mode.html
Yes, encoding shall be agreed and not dependent on current default encoding. Otherwise we will repeat errors as in distutils or in pbr package.
For Python 2.x compatibility, this ConfigParser snippet from PyMoTW could help.
@ncoghlan if you have a sec and a general comment on this one also I’d be super interested
I'd missed that we weren't already enforcing UTF-8 everywhere - we definitely should be.
As @uranusjr notes, that's straightforward for new files and for ASCII-only files, which means the main cases to handle are going to be existing locale encoded files on *nix systems and code page encoding files on Windows.
We're helped here by the fact that PyPI itself restricts package names to ASCII-only identifiers, so non-ASCII characters can only appear in comments, and in non-PyPI package names and URLs.
I figure that just fixing the encoding (for files pipenv
was about to edit anyway) is going to be easier than remembering to preserve the original encoding when writing them back, so I think transparently resolving the problem will be straightforward enough to be worth doing.
(@techalchemy Regarding the UTF-8 mode change in 3.7: that's only the default in the POSIX or C locale, although I think it may be the default on Windows in 3.6+)
Thanks for the input @ncoghlan I know you have a lot of projects to juggle so I appreciate it. I agree we are handling pretty much universally ascii characters and before yesterday I’m not sure we actually preserved all comments to begin with. So I agree in light of that it would seem worth pursuing (the risk seems low)
This reminds me, I also discovered we’re writing CR/LF line endings on Windows. This will need to be normalised as well.
Spun off from #1842.
Currently Pipenv does not handle non-ASCII characters very well. Pipfile is parsed with the platform’s default encoding; this becomes a problem for cross-platform projects if there are non-ASCII paths in the project, especially for the
scripts
section, but also for local dependencies’path
value.A straightforward solution would be to strictly require Pipfiles to be encoded with UTF-8. This is in line with the best practice of the Python community in general, including core development. Pipenv would then use
io.open
instead ofopen
to read Pipfiles instead (inproject.py
), so contents, including paths, are always read as text in the correct encoding.