Closed chrissimpkins closed 3 years ago
From Colin in https://github.com/linebender/norad/pull/161#issuecomment-899644633
I'd be most interested in knowing if there is an accepted convention for this. In the general case I would expect a tool to use the correct line endings for the given platform, unless there is an established convention to do otherwise?
I am not familiar with the ufonormalizer source so my quick read through may have missed something. It looks like it may use Unix style line endings across platforms?
Impl:
Test:
I think so, too. So, one more writer option to add :grimacing:
What is the optimal default approach?
By default, use platform line ending. By request, use LF.
By default, use platform line ending. By request, use LF.
@cmyr this is the current consensus opinion on platform-specific line endings. No one else has chimed in to date.
okay, great. Then the next step is to see what the current behaviour is. Are we writing out CRLF on windows, anywhere? (.glif
, .plist
?) we'll also need to audit anywhere where we currently write \n
directly, and we'll need custom rewriting for features.fea
.
I can confirm that glif files are serialized with platform-specific line endings based on the unit tests in https://github.com/source-foundry/ufofmt/pull/25
It looks like plist files serialize with platform-specific line endings too.
Win test fail on glif XML file with expected '\n'
line ending: https://github.com/source-foundry/ufofmt/runs/3347607373#step:4:147
Win test fail on fontinfo.plist file with expected '\n'
line ending: https://github.com/source-foundry/ufofmt/pull/25/checks?check_run_id=3373761003#step:4:147
This is based on norad serialization with the lib @ rev 1a20be6
I was tinkering with this last night and haven't landed on a good solution yet. The glif and plist data appear to be byte arrays at the stage where we are manipulating the data in the write.rs
module. We could just brute force search every char and replace the entire byte array with a new u8 collection type (I suppose allocation of a Vec\r
char filtered out.
I think that there should be a more optimal approach. For starters, I think that we can assume that the first line of the file (defined as everything before the first \n
across platforms) is all that we need to test to understand if \r\n
line endings are used. If there is a mix of line endings in a file, the user's serializer is bugged and I would consider those invalid UFO files in need of repair. So there can be a single test of "is Win style line ending present?" per file and further find/replace or filter in the file is avoided if the test is negative.
Once we know that \r\n
is used, what is the most optimal approach to remove the \r
line ending bytes throughout the file? I'm not aware of an approach that does not traverse the full file char set through splits/re-joins or indexed finds with filters and allocation of a new mutable data collection type. Is it possible to create an iterator over the byte array that filters on a (|b| b != '\r')
test and pass that to the XML serializers?
I found this piece of code: https://github.com/derekdreery/normalize-line-endings/blob/master/src/lib.rs
I found this piece of code: https://github.com/derekdreery/normalize-line-endings/blob/master/src/lib.rs
This source was perfect Nikolaus. I refactored it to use byte arrays and worked it into the File writer / serialization options here. Since we will make Win-platform only \r\n
-> \n
line ending changes, it is only executed on the Win platform. I'll push a PR so that we can have a look on the Win test runner. I don't have a local Win + Rust system to test. But it looks like it will work and compiles. (Famous last words...)
The work in #170 identified inconsistent plist and glif line endings on Win. Looking into a way to make Win serialize with \r\n
and non-Win to serialize with \n
across all UFO file serialization approaches.
By default, use platform line ending. By request, use LF.
Please please don’t do this. It is 2021 already, use LF everywhere and be done with it. It is not like Windows applications will crash if you feed it to them (heck, even notepad supports LF now).
I don't feel strongly one way or the other on this issue but I think that we should use a consistent approach across all files. Right now there is a mixed approach across plist (\r\n on Win, \n everywhere else), glif (\n everywhere), and fea (defined by whatever was used to write the fea file) files. I see this as less of an issue with Windows reads than one of consistency with other serializers in use out there. FWIW, whatever the approach, we are working towards a simple way to customize the source serialization format so that you can minimize unnecessary XML declaration quote style, indent style, and line ending style diffs when norad is used with other UFO serializers.
I think LF should always be used, I wouldn’t even provide an option to save in a different line ending, but I don’t feel strongly about it as long as it is not the default. Otherwise, one hit dependencies, mysteriously failing CI builds and other stuff that is unnecessary since pretty much anything that one would ever care about will handle LF just fine.
It doesn't work around those issues. We would need to modify plist serialization (instead of glif serialization) to follow that format. The libs used to serialize have different opinions on the matter. :)
@khaledhosny do you know if ufonormalizer uses line feeds across platforms?
I'm ok with forcing LF everywhere. Feature files would need to be read and written through the LF normalizer iterator, I guess plist
would need to be changed?
Yeah, we can use the approach in #170 and swap the replacement strategy. Let's decide on whether it is ok to introduce the new Unicode-free regex and lazy_static crate deps to achieve it. We should be able to do it slower with the iterator approach and no additional dependencies as a first pass. When someone has the time we can dig into how the regex crate does it in the bytes module and transition to that approach if performance without new dependencies are goals here.
Closed #170 Opened https://github.com/linebender/norad/pull/172
Thoughts on how to approach cross-platform line endings? Should norad write consistent line endings on all platforms (limits diff when norad serialization is used x-platform)? Keep line ending format that it reads in on subsequent serialization (maybe limits diff when norad serialization is used with other serializers)? Allow custom line ending support (flexible and maybe limits diffs across serializers and platforms)?
@belluzj mentioned today that he likes the idea of a runebender config panel with Unix vs. Win line ending style options on UFO serialization. I think that we can expand the approaches used for spacing and XML declaration quote customization to address it if this is desirable. All of these serialization format options might be useful in such a config panel.
Originally posted by @chrissimpkins in https://github.com/linebender/norad/issues/161#issuecomment-899638531