Open benkiel opened 4 years ago
Did anyone volunteer to document what ufoNormalizer
does?
Not to my knowledge. I'll take it on in a couple of weeks if no one else wants to.
I took a quick look at ufoNormalizer and I sort of remember how it works, so I'll take this on.
Random finding: ufoLib and ufoNormalizer both indent the first <dict>
element, macOS plist files don't.
Another: ufonormalizer likes the form <array/>
instead of <array />
.
Also: ufonormalizer does not contract <array></array>
into <array/>
.
The indentation thing alone is worth a lot. A formatting recommendation would save us all from 90% whitespace diffs replacing almost every line in your font. In the wild I saw this today: https://github.com/eliheuer/baby-shark/pull/3/files
There's this discussion going on over at the norad project (UFO library in Rust) https://github.com/linebender/norad/issues/101
Having an official recommendation would help settle this sort of question.
My 0.02USD: it's better for the recommendation to reflect the norms for authoring tools, than the norms for normalizers. plists in the wild have two-space indents starting at the second level of nesting; and the norm for XML formatting is two-space indents starting at the first level of nesting (default for libxml2 and probably some others). Far as I can tell, most mature authoring tools respect these norms.
Ignoring the last six years of the usage of ufonormalizer
in workflows isn't the best idea either. A recommendation that breaks reasonable expectations for normalization based on history isn't great.
I looked into why tabs were chosen over spaces for the ufonormalizer
: it was like that from the start six years ago, I'm not sure why that choice was made, but if I had to venture a guess it would be that ElementTree at that point used tabs and not spaces by default.
Personal opinion: it would be a great waste to spend effort on a recommendation if it is going to devolve into a tabs vs spaces war.
I looked into why tabs were chosen over spaces for the ufonormalizer: it was like that from the start six years ago, I'm not sure why that choice was made, but if I had to venture a guess it would be that ElementTree at that point used tabs and not spaces by default.
I made that decision and it was because… I don't know. Maybe it was ElementTree, maybe I was thinking about file size, maybe I had some other reason. It doesn't matter. It's the decision that was made and it is what is out there in real world use.
Personal opinion: it would be a great waste to spend effort on a recommendation if it is going to devolve into a tabs vs spaces war.
Yes x 100000000000000000000.
I think that's why we need a recommendation, to avoid the space vs tabs war (or more accurately: to have it once and for all). As to which is the correct choice now, I would also vote for whatever ufonormalizer does because at Dalton Maag we're normalizing everything with ufonormalizer. However, if the community finds that spaces are better for some reason, then we'll do spaces, as long as all the tools align with the new decision.
FDBP https://silnrsi.github.io/FDBP/en-US/UFO.html, for example, still recommends the psfnormalize
format, which uses the default formats for plists and XML that I mentioned in my comment above.
Yes, as far as I can tell, psfnormalize
came a bit after ufonormalizer
. They have different ways of normalizing, neither is bad or wrong, just different. A recommendation is just that, a recommendation. A project is free to choose a normalizer that isn't the recommendation.
The UFO isn't going to ever mandate a certain style, as sometimes there are limits placed by different packages, programming languages, etc. on what is possible for XML formatting (again, we faced this when adding lxml as an option for ufoLib).
There can't be a recommendation that makes 100% of everyone happy, and that is ok.
The UFO isn't going to ever mandate a certain style, as sometimes there are limits placed by different packages, programming languages, etc. on what is possible for XML formatting (again, we faced this when adding lxml as an option for ufoLib).
Note that lxml can use XSLT 1.0 to format XML whichever way one wants, it's just painful.
A project is free to choose a normalizer that isn't the recommendation.
The ideal would be having normalized output coming directly out of tools. That seems to be one goal of Norad. Running external normalizers is not that inconvenient, and if they're installed on the target machine, you can include them in repository hooks... but it's not what happens by default when you save a UFO.
The ideal would be having normalized output coming directly out of tools.
Not all XML writers allow specifying the details of the output. We can write a recommendation, but we can't make it a requirement. As far as writing that recommendation goes, UFO is a self-funded project and we implement things when our time allows and/or our needs overlap. If someone wants to volunteer to write a recommendation as we've detailed, we'll be more than happy to review a PR.
Running external normalizers is not that inconvenient, and if they're installed on the target machine, you can include them in repository hooks... but it's not what happens by default when you save a UFO.
That's a workflow problem, right? If Project X is putting UFOs in a repository and they need the UFOs to share the same format to minimize their diffs, Project X should specify that at the project level and all contributors to Project X are required to use the same format. That's not the file format spec's problem to solve.
It also should be noted that noisy diffs when committing to a Git repository is not an issue faced by all UFO users.
Note that lxml can use XSLT 1.0 to format XML whichever way one wants, it's just painful.
I wonder how tough it would be to control the whitespace in the XSLT, along with the ordering. If it works well, it only has to be painful once.
If anyone wants to take on making a XSLT to format the XML, we would love that.
We open-sourced the XSLT that we've been using at Dalton Maag, here: https://github.com/daltonmaag/ufo-normalizer-xslt
It produces the same output as ufoNormalizer, at least when it comes to tabs and spaces and trivial things like that.
The main developer was @moyogo when he was working at Dalton Maag, credits to him!
@belluzj Thank you for doing this, will look into it in the coming weeks
While at it, single vs. double quotes would be nice to resolve too. :)
I think the single quotes are an unfortunate historical accident in lxml.
As discussed, add a recommendation (not requirement) for XML formatting to the specification.