ufonormalizer and psfnormalize produce different results

unified-font-object / ufoNormalizer

A tool that will normalize the XML and other data inside of a UFO.

Other

51 stars 19 forks source link

ufonormalizer and psfnormalize produce different results #70

Closed probonopd closed 4 years ago

probonopd commented 4 years ago

ufonormalizer and psfnormalize produce different results.

This kinda defeats the purpose, doesn't it?

To reproduce:

In a local test git repository, normalize a font with psfnormalize, then git commit, then normalize the already normalized font with ufonormalizer. You would expect no changes a git diff will show plenty of changes.

Including trivial ones such as (psfnormalize)

<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">

vs. (ufonormalizer)

<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">

I think both tools should either produce 100% the same "normalization", or else it is not a "normalization". Maybe having 2 such tools just complicates matters and one should be retired?

Reference: https://github.com/silnrsi/pysilfont/issues/69

jvgaultney commented 4 years ago

psfnormalizer long predates ufonormalizer, and applies a pysilfont-specific definition of what is 'normal' based on our best guess at common, reasonable usage. There is, however, no published standard for what a 'normal' UFO3 ought to look like in the format definition. It is not enough to say that whatever ufonormalizer produces is the standard for normal.

We'd love to see a published and accepted standard for UFO3 normal form, but despite our pleas that has never been defined. Once there is a standard we'd certainly look at making our toolkit support it. Having that definition clearly in place, with all implementations expected to follow it, would make it possible to unify the tools.

IOW, the problem is not with the tools, it is with the lack of a standard.

https://github.com/silnrsi/pysilfont/issues/69

benkiel commented 4 years ago

@jvgaultney, hard disagree here. The issue is with how different xml libraries write out XML (whitespace, self closing tags, quote marks, etc). There isn't a standard here. Picking one library to make the standard would make tools/platforms that don't use that library jump through a lot of hoops to match a formatting standard (see ufoLib when lxml became an option for speed).

Would you want to implement a specification that was so finely detailed that it specified whitespace? This is an issue with XML, not with the UFO spec.

ufoNormalizer made different choices than psfnormalizer. That's ok. Pick the one you like and use it.

jvgaultney commented 4 years ago

I'd agree that a purely XML format does not specify such things. However in practice that's a real pain - hence both of our work on normalizers. If someone wanted to be pedantic they could say that normalizers for UFO were philosophically out-of-scope - that's the argument I heard for years

In real practice, when you have a format that is interpreted and written by many, many different tools, and you care about source control, you have to lay down some standards, such as how to handle floating points and integer values, how to handle empty elements, closing tags, etc. (although specifying whitespace seems a bit far even for me). Yes, ideally that shouldn't be necessary, and isn't technically appropriate for an XML format, but is a real help to users. The existence of this tool proves that point, and means you can't just accept differences in xml libraries. If a tool wants to truly support a friendly-and-easily-usable UFO format I'd argue they should do a little extra work to leave the UFO in a normalized state.(BTW any speed argument here is dubious, as the tool might be faster in writing an unnormalized UFO, but then the user has to take the time somewhere in the process to normalize, so there's little effective speed savings.)

I do feel sympathy towards @probonopd and understand his frustration with multiple normalizers, but don't see a way forward to change that unless we can agree semi-formally about what our UFOs should look like.

Our working philosophy is to pick a UFO normalization style (ufonormalizer, psfnormalizer, or whatever) for the individual project, then stick with that and run the appropriate normalizer anytime you've touched the UFO with some tool. That works for us, but does place an extra technical burden on designers.

benkiel commented 4 years ago

I would say that specifying formatting isn't philosophically out of scope, but practically so. Should the format penalize valid XML if it's not formatted according to the specification? That seems a bit too far.

Easy to use is related to easy to implement. Making developers jump through extra hoops to implement a format to spec makes adoption harder. The lxml example was a case of that, there were some formatting things that it simply will not do as part of its design, so if a developer wanted to use it for it's speed boosts in reading/writing, then they would have to code around how it formats output.

That said, I don't disagree at all that not have it specified is an issue for source control, and that's why there are normalizers for that use case. But source control is a use case, not a format. And getting standards in place for source control is a project problem (see every project that has to pick how documentation will be handled, tabs vs spaces, etc), not a format problem.

jvgaultney commented 4 years ago

I'd agree that balancing user ease and dev adoption is tricky. While I'd still suggest that the definition of a 'normal' form does have a place in a spec, it may need to be a style recommendation rather than a requirement (think pep8 for python).

This also points out that the two normalizers have different purposes and use case scenarios:

ufonormalizer seems to be ideal for basic, behind-the-scenes normalization, such as being built into Robofont or automating with git hooks. It doesn't prescribe what needs to be there, it just keeps it tidy. It is meant to be quiet and low friction. It seems to be best if you're just using a limited set of tools that agree on what should and shouldn't be in the UFO. (Hope that's a fair description)

psfnormalizer is good if you're using a wide variety of tools and need the normalizer to fix the UFOs that those tools produce. The -p checkfix=fix parameter will fix up the UFO to meet the style recommendations we've determined from a variety of sources, and also check the contents of the UFOs for what is expected to be there (or not to be there). See the pysilfont docs on normalization. It's more of an active, verbose tool.

Even if there were to be a generally accepted 'style' for UFOs, there might still be a need for two different tools.

xorgy commented 3 years ago

I will say that the whitespace choice made by ufoNormalizer is strange in the context. ufoNormalizer and Norad are the only tools I have seen that prefer tab indents to two-space indents, and the latter only uses tabs because ufoNormalizer does. The specification has examples in it, and all of them use two-space indents, most authoring tools use two-space indents, most processing tools use two-space indents, and I can tell you personally that I will always prefer two-space indents when editing in a text editor (enough that I will run xmllint on everything before an editing session, if there are tabs).

Indentation affects almost every text line in your font and disagreements over it can cause massive, unnecessary diffs that only developers/power users know how to avoid.