mtkennerly / poetry-dynamic-versioning

Plugin for Poetry to enable dynamic versioning based on VCS tags
MIT License
588 stars 36 forks source link

Change write_bytes to write_text to preserve newlines #157

Closed nardi closed 7 months ago

nardi commented 7 months ago

Hello, I had a problem using this plugin on Windows (Python 3.11). I noticed that after running a command that uses version substitution, pyproject.toml and other files were back to their original contents, but the newlines had been changed (from CRLF to LF), which causes Git to see it as modified. Running git add on them detects that they are otherwise unchanged and doesn't stage them, but it is still a bit confusing.

I was able to fix this by changing calls to write_bytes to instead use write_text, which writes newlines using a platform specific convention ("universal newlines"), which I think is usually the preferred behavior. It could be that the calls to write_bytes were made for a reason that I'm not aware of, but if this is an acceptable change it would improve the user experience on Windows.

Thanks!

mtkennerly commented 7 months ago

Hi! Thanks for your PR. Someone else reported the same issue before (https://github.com/mtkennerly/poetry-dynamic-versioning/issues/105), but in that case, changing from write_text to write_bytes was the solution 😅 (also see v0.21.2)

My understanding is that write_text normalizes the line endings based on the OS, which may or may not be what you want depending on your Git core.autocrlf and core.eol settings. write_bytes is supposed to preserve whichever line endings the file originally had, so it's not clear to me why they would be changing here.

For what it's worth, I'm on Windows, and I used to see this issue with write_text, but I haven't been seeing it with write_bytes. I'm not sure what environment/config difference might account for that.

nardi commented 7 months ago

Ah, thanks for your reply! I guess the behavior makes sense, actually. Python does this newline normalization, which means it reads both LF and CRLF as LF. Then, it writes LF as CRLF or LF depending on the platform. So when reading and then writing text, LF will be preserved on Linux and CRLF on Windows, but CRLF will be converted to LF on Linux, and LF to CRLF on Windows. This is what was happening in issue #105.

Currently, the usage is asymmetric: read_text and write_bytes are used. This means they are normalized to LF on read and then written as LF no matter the platform. Instead, if we really want to preserve the original newlines, we have two options:

  1. Use both read_bytes and write_bytes
  2. Use open(..., newline="") for reading and write_text(..., newline="") for writing

I would say using the newline="" argument is the option that conveys intent better, but unfortunately there is no newline argument to read_text. Also, the one for write_text only exists since Python 3.10. So I think the first would be the best option, replacing calls to read_text with read_bytes(...).decode("utf8"). Do you agree? :)

mtkennerly commented 7 months ago

Ah, that's a great point about read_text. I agree that read_bytes would be a good solution :)