unified-font-object / ufo-spec

The official Unified Font Object specification source files.
http://unifiedfontobject.org
174 stars 30 forks source link

Document recommendation for normalization #144

Open benkiel opened 4 years ago

benkiel commented 4 years ago

As discussed, add a recommendation (not requirement) for XML formatting to the specification.

typesupply commented 4 years ago

Did anyone volunteer to document what ufoNormalizer does?

benkiel commented 4 years ago

Not to my knowledge. I'll take it on in a couple of weeks if no one else wants to.

typesupply commented 4 years ago

I took a quick look at ufoNormalizer and I sort of remember how it works, so I'll take this on.

madig commented 3 years ago

Random finding: ufoLib and ufoNormalizer both indent the first <dict> element, macOS plist files don't.

Another: ufonormalizer likes the form <array/> instead of <array />.

Also: ufonormalizer does not contract <array></array> into <array/>.

xorgy commented 3 years ago

The indentation thing alone is worth a lot. A formatting recommendation would save us all from 90% whitespace diffs replacing almost every line in your font. In the wild I saw this today: https://github.com/eliheuer/baby-shark/pull/3/files

belluzj commented 3 years ago

There's this discussion going on over at the norad project (UFO library in Rust) https://github.com/linebender/norad/issues/101

Having an official recommendation would help settle this sort of question.

xorgy commented 3 years ago

My 0.02USD: it's better for the recommendation to reflect the norms for authoring tools, than the norms for normalizers. plists in the wild have two-space indents starting at the second level of nesting; and the norm for XML formatting is two-space indents starting at the first level of nesting (default for libxml2 and probably some others). Far as I can tell, most mature authoring tools respect these norms.

benkiel commented 3 years ago

Ignoring the last six years of the usage of ufonormalizer in workflows isn't the best idea either. A recommendation that breaks reasonable expectations for normalization based on history isn't great.

I looked into why tabs were chosen over spaces for the ufonormalizer: it was like that from the start six years ago, I'm not sure why that choice was made, but if I had to venture a guess it would be that ElementTree at that point used tabs and not spaces by default.

Personal opinion: it would be a great waste to spend effort on a recommendation if it is going to devolve into a tabs vs spaces war.

typesupply commented 3 years ago

I looked into why tabs were chosen over spaces for the ufonormalizer: it was like that from the start six years ago, I'm not sure why that choice was made, but if I had to venture a guess it would be that ElementTree at that point used tabs and not spaces by default.

I made that decision and it was because… I don't know. Maybe it was ElementTree, maybe I was thinking about file size, maybe I had some other reason. It doesn't matter. It's the decision that was made and it is what is out there in real world use.

Personal opinion: it would be a great waste to spend effort on a recommendation if it is going to devolve into a tabs vs spaces war.

Yes x 100000000000000000000.

belluzj commented 3 years ago

I think that's why we need a recommendation, to avoid the space vs tabs war (or more accurately: to have it once and for all). As to which is the correct choice now, I would also vote for whatever ufonormalizer does because at Dalton Maag we're normalizing everything with ufonormalizer. However, if the community finds that spaces are better for some reason, then we'll do spaces, as long as all the tools align with the new decision.

xorgy commented 3 years ago

FDBP https://silnrsi.github.io/FDBP/en-US/UFO.html, for example, still recommends the psfnormalize format, which uses the default formats for plists and XML that I mentioned in my comment above.

benkiel commented 3 years ago

Yes, as far as I can tell, psfnormalize came a bit after ufonormalizer. They have different ways of normalizing, neither is bad or wrong, just different. A recommendation is just that, a recommendation. A project is free to choose a normalizer that isn't the recommendation.

The UFO isn't going to ever mandate a certain style, as sometimes there are limits placed by different packages, programming languages, etc. on what is possible for XML formatting (again, we faced this when adding lxml as an option for ufoLib).

There can't be a recommendation that makes 100% of everyone happy, and that is ok.

moyogo commented 3 years ago

The UFO isn't going to ever mandate a certain style, as sometimes there are limits placed by different packages, programming languages, etc. on what is possible for XML formatting (again, we faced this when adding lxml as an option for ufoLib).

Note that lxml can use XSLT 1.0 to format XML whichever way one wants, it's just painful.

xorgy commented 3 years ago

A project is free to choose a normalizer that isn't the recommendation.

The ideal would be having normalized output coming directly out of tools. That seems to be one goal of Norad. Running external normalizers is not that inconvenient, and if they're installed on the target machine, you can include them in repository hooks... but it's not what happens by default when you save a UFO.

typesupply commented 3 years ago

The ideal would be having normalized output coming directly out of tools.

Not all XML writers allow specifying the details of the output. We can write a recommendation, but we can't make it a requirement. As far as writing that recommendation goes, UFO is a self-funded project and we implement things when our time allows and/or our needs overlap. If someone wants to volunteer to write a recommendation as we've detailed, we'll be more than happy to review a PR.

Running external normalizers is not that inconvenient, and if they're installed on the target machine, you can include them in repository hooks... but it's not what happens by default when you save a UFO.

That's a workflow problem, right? If Project X is putting UFOs in a repository and they need the UFOs to share the same format to minimize their diffs, Project X should specify that at the project level and all contributors to Project X are required to use the same format. That's not the file format spec's problem to solve.

It also should be noted that noisy diffs when committing to a Git repository is not an issue faced by all UFO users.

xorgy commented 3 years ago

Note that lxml can use XSLT 1.0 to format XML whichever way one wants, it's just painful.

I wonder how tough it would be to control the whitespace in the XSLT, along with the ordering. If it works well, it only has to be painful once.

benkiel commented 3 years ago

If anyone wants to take on making a XSLT to format the XML, we would love that.

belluzj commented 3 years ago

We open-sourced the XSLT that we've been using at Dalton Maag, here: https://github.com/daltonmaag/ufo-normalizer-xslt

It produces the same output as ufoNormalizer, at least when it comes to tabs and spaces and trivial things like that.

The main developer was @moyogo when he was working at Dalton Maag, credits to him!

benkiel commented 3 years ago

@belluzj Thank you for doing this, will look into it in the coming weeks

chrissimpkins commented 3 years ago

While at it, single vs. double quotes would be nice to resolve too. :)

madig commented 3 years ago

I think the single quotes are an unfortunate historical accident in lxml.