mlange-42 / yarner

Literate Programming command line tool for Markdown
https://mlange-42.github.io/yarner
MIT License
32 stars 1 forks source link

[Fix] Handle Windows line endings (esp. in reverse mode) #119

Closed mlange-42 closed 3 years ago

mlange-42 commented 3 years ago

Internally, Yarner works with Unix line endings (LF / \n). Thus, all output has LF line endings. In reverse mode, this results in Windows line endings (CRLF / \r\n) being converted to Unix line endings.

E.g., Git may then show all source files as changed, but no diffs, or a lot of warnings.

Possible solutions:

  1. Respect line endings, based on either
    • Configurable in the Yarner.toml (see #120), or
    • The user's system (see #120, too), or
    • Existing line endings
  2. Recommend the use of Unix line endings, and provide the required Git configuration
adamreichold commented 3 years ago

Recommend the use of Unix line endings, and provide the required Git configuration

Personally, I find this to be the preferable option. Basically all Windows programs handle Unix line endings by now. Despite the name, they are also "Internet line endings" by now. And they are fewer bytes to shuffle around. Thinking of the trees and such...

So while it definitely is a usability hazard to set things up initially, I think it would be better in the long run to nudge people towards Unix line endings than to add significant complexity to handle CRLF idiosyncrasies.

mlange-42 commented 3 years ago

Hm, it is quite inconvenient in its current state on Windows. Editors support LF endings, but by default, all new files have CRLF. Also, not all editors support easy switching (like available in NP++).

My current implementation in #120 adds 80 lines of code (thereof 30 lines tests), but mainly due to the fact that I made 2 lines out of each writeln!(...) to make newlines more obvious for the reader. It works quite well and allows both strategies - System newlines + let Git handle the conversion (option crlf_newline not given), and doing everything with LF (crlf_newline = false).

However, I start do doubt whether crlf_newline = true makes any sense. I.e. why should a non-Windows user want to use CRLF...

In case I settle on the user choice (at least between system newline vs. unix), I would definitely add a respective section to the guide.

adamreichold commented 3 years ago

Hm, it is quite inconvenient in its current state on Windows.

Isn't this a one time issue, i.e. the problem will go away after the first usage of reverse mode?

but mainly due to the fact that I made 2 lines out of each writeln!(...) to make newlines more obvious for the reader.

I think the complexity is more a mental measure than the number of lines. The Rust standard library is good data point here IMHO: things like .lines() handle both CRLF and LF, but writeln! will only ever produce LF.

However, I start do doubt whether crlf_newline = true makes any sense. I.e. why should a non-Windows user want to use CRLF...

This might happen if a Unix user interacts with a project that is mainly developed on Windows.

If you do decide to invest into handling this, I would suggest detecting the choice of line endings automatically to avoid adding yet another setting: You can always write out code and document files using Unix line endings, but if you merge code back into the source files, then use the line endings of that target file.

adamreichold commented 3 years ago

The Rust standard library is good data point here IMHO: things like .lines() handle both CRLF and LF, but writeln! will only ever produce LF.

Another thing to note here is that you also committed to using UTF-8 text encoding via the standard library even though Microsoft only started endorsing this encoding in 2019 and plain text files on Windows often use UTF-16 or even legacy encodings like Windows-1250.

mlange-42 commented 3 years ago

I would suggest detecting the choice of line endings automatically to avoid adding yet another setting: You can always write out code and document files using Unix line endings, but if you merge code back into the source files, then use the line endings of that target file.

At first glance, I really like this approach. However, as with the current status, it conflicts with the default Git setting on Windows autocrlf = true (with your suggestion only when committing code files), as on checkout, line endings are converted to CRLF. Thus, both version restrict users respective their Git settings. The version in #120 allows any Git setting, and matches the default settings when the option is not given in the toml file (use native newline + autocrlf = true on win).

things like .lines() handle both CRLF and LF, but writeln! will only ever produce LF.

Yes, that's how it is.

So currently, I am a bit undecided...

adamreichold commented 3 years ago

However, as with the current status, it conflicts with the default Git setting on Windows autocrlf = true (with your suggestion only when committing code files).

I am not sure I follow: You mean if one commits code and documentation into Git? But even if one does, if it is generated by Yarner wouldn't it always have Unix line endings and hence no spurious diffs?

mlange-42 commented 3 years ago

Say autocrlf = true. Code would have LF line endings after generating it. During commit/staging(?), line endings are converted to LF (i.e. no effect). When checking out again, line endings are converted to CRLF. Re-generating code would change line endings to LF again, and that would cause diffs al over the place.

I think this is what caused the diffs for me in reverse mode. I will try in forward mode to confirm...

adamreichold commented 3 years ago

When checking out again, line endings are converted to CRLF.

While committing generated files like code and documents is probably an issue itself, this indeed unfortunate. But then again, this could be handled automatically as well by "inheriting" the line endings from the human-written source files.

mlange-42 commented 3 years ago
  1. Generate code (LF) with git status I get a warning for every file with LF.
  2. Commit
  3. Switch to a branch without the code files, and then switch back. I.e. code is "re-generated" from the repository. Line endings in code are now CRLF
  4. Re-generate code with Yarner (now again, LF)

Git says:

C:\Projects\yarner-test>git status
On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   code/Cargo.toml
        modified:   code/src/main.rs
        modified:   docs/README.md

no changes added to commit (use "git add" and/or "git commit -a")

C:\Projects\yarner-test>git diff
warning: LF will be replaced by CRLF in code/Cargo.toml.
The file will have its original line endings in your working directory
warning: LF will be replaced by CRLF in code/src/main.rs.
The file will have its original line endings in your working directory
warning: LF will be replaced by CRLF in docs/README.md.
The file will have its original line endings in your working directory
mlange-42 commented 3 years ago

handled automatically as well by "inheriting" the line endings from the human-written source files.

Yes, but then for everything, not only reverse MD writing. I think I will give this a try.

mlange-42 commented 3 years ago

While committing generated files like code and documents is probably an issue itself, this indeed unfortunate.

For code indeed, while for documentation output it is not that clear. See Project structure - top-level docs.

mlange-42 commented 3 years ago

Implemented through detection of line endings in source documents, in #122.