Closed rdmurphy closed 7 years ago
+1 indeed. It's that thing that I forgot came up last year until we came to it this year.
According to Serdar, who is the one who pounded this into @esagara and me, nothing unexpected happens if you read/write a regular file with the binary option, but if you try to read/write a binary file without the binary option, it causes Bad Things. So it's safer just to always use "rb" and "wb." (Note that I haven't researched this myself, so perhaps there's something else to it, but I tend to trust anyone who smokes cigars with me.)
The binary flag for read/write bits help avoid corrupting binary files (e.g. jpeg) on Windows machines, but I've found that it also helps avoid cross-platform headaches when transferring text files between *nix and Windows. I'll admit I can't explain the precise reasons (I'm guessing it's because of differences in the newline character), but I've noticed the binary flag occasionally resolves problems on Windows when shuffling text files between OS. @hbillings That's why we decided a while back it'd just be safer to always use it across the board (and pound it into the heads of all our poor unfortunate students :)
That said, this might be causing more headaches than it's worth in a teaching context (I know we got questions about it every year as well). You might consider dropping it. If and when folks get bit by this in their Python careers, no doubt they'll be able to sort through it by pinging you all or PyJournos :)
Just be sure to test those scripts and data ahead of class if you wind up using Windows machines again!
So in porting the exercises to Python3, I found that trying to read the text files with "rb" was throwing an error. I had to remove the "b" to get it to work. Does anyone know if Python3 handles the binary flag differently for text files? Is this no longer a concern?
@tommeagher I got bit by this too. The "short" answer is that text vs. binary data handling is saner in Python3 -- in a way that now requires the usage of the binary mode only when you actually need bytes of data (rather than encoded characters). In olden 2.x days, the 'b' was often used to ensure file reads worked properly on certain platforms such as Windows, although the flag was generally ignored on Unix-like systems.
A more thorough explanation is here: http://python3porting.com/preparing.html#separate-binary-data-and-strings
This is also helpful imho: https://docs.python.org/3/library/functions.html#open
@zstumgoren got it. This is really helpful. So it seems then this discussion is now moot. Thanks for the advice!
We had this come up in our session – people wanted to know why using the
b
was safer. It'd be nice for us to actually be able to explain that beyond just saying, "Everybody does it! Trust us." :smile:(Which is basically what we did.)