unified-font-object / ufoNormalizer

A tool that will normalize the XML and other data inside of a UFO.
Other
51 stars 19 forks source link

Proposal for formatting contents of `<note>` field #85

Closed cmyr closed 3 years ago

cmyr commented 3 years ago

I would like to acknowledge that this particular issue is likely of little significance, and I am not hugely invested in my position. Ultimately, to me, this is about something like "correctness".

Breaking this out from #84, now that I have had some time to dig into the code and have a more informed opinion.

current status

The current situation: currently, ufonormalizer does a bunch of opinionated formatting of the contents of <note>. Essentially it breaks the text into 72 character lines, and then indents each of these lines to the level of the parent <note> element, plus 1. This current text, so formatted, looks like:

<note>
    currently, `ufonormalizer` does a bunch of opinionated formatting of
    the contents of `&gt;note&lt;'`. Essentially it breaks the text into 
    72 character lines, and then indents each of these lines to the level 
    of the parent `&gt;note&lt;` element, plus 1. This current text, so 
    formatted, looks like:
</note>

This has the advantage that it looks nice, but it has the disadvantage that it is destructive: the actual text of the note will change, and other tools don't have any way of telling, for a given UFO, whether or not the contents of the file have been normalized (and so should be "denormalized"?) or if they are the original intended data.

The problem with this approach is that it modifies the user's data. Although this may not matter often, it can certainly matter sometimes. A trivial example of where this would matter is if the note includes something with significant whitespace, like a bit of python code, or a boxart diagram, or a poem.

Proposal

Basically: if the note contains only whitespace, we discard it; this is the current behaviour. Otherwise we escape the string as usual, and just write a "simple element".

This does mean that the XML will be "uglier" in the case where there are long unbroken lines in the note, but there are a bunch of advantages:

In this world, the XML for our sample paragraph looks like (with the parent tags for context),

<glif>
    <note>currently, `ufonormalizer` does a bunch of opinionated formatting of the contents of `&lt;note&gt;`. Essentially it breaks the text into 72 character lines, and then indents each of these lines to the level of the parent `&lt;note&gt;` element, plus 1. This current text, so formatted, looks like:</note>
</glif>

And if it contained some manual linebreaks, those lines would not respect the outer indent level, like

<glif>
    <note>currently, `ufonormalizer` does a bunch of opinionated formatting of the contents of `&lt;note&gt;`.
Essentially it breaks the text into 72 character lines, and then indents each of these lines to the level of the parent `&lt;note&gt;` element, plus 1.
This current text, so formatted, looks like:</note>
</glif>

An alternative I had considered is stripping leading and trailing whitespace and then always putting the note body on the next line, with a new intent level, but ultimately I think this is unnecessary fanciness and extra logic for minimal concrete gain.

... okay, I'm done. If anyone has any thoughts, feel free to chime in :)

benkiel commented 3 years ago

I am ok with this change. Will wait for others to chime in (@miguelsousa @typesupply), but if ok, would you be able to PR the changes (and tests)?

typesupply commented 3 years ago

I'm fine with this, but I have another idea that would retain the current behavior and enable the proposed behavior:

The line length is governed by the xmlTextMaxLineLength variable in the module. We could allow this to be overridden during the public normalization functions and those would pass the setting to XMLWriter. From there, if the desired xmlTextMaxLineLength value is None or 0 the text wouldn't be passed to textwrap for wrapping and wpould be completely unchanged.

I have no strong feeling about this so I have no objection to the proposal above to not pass any text to textwrap like we currently do.

cmyr commented 3 years ago

I am ok with this change. Will wait for others to chime in (@miguelsousa @typesupply), but if ok, would you be able to PR the changes (and tests)?

If we agree I am happy to write the patch.

miguelsousa commented 3 years ago

No objections from me.

benkiel commented 3 years ago

Alright, @cmyr, however you want to write the PR (do see what @typesupply proposes above) is cool. I think we have enough agreement on this to make the change.

cmyr commented 3 years ago

Okay, I'll get this ready shortly. I'm not totally sure about the change to allow specifying a custom wrap width; it feels like in practice that would only possibly be used for this case? it also means that there are two codepaths to maintain.

So for now I'll start prepare a patch that does the dumbest thing possible (no line formatting, ever) and if there is a strong desire for this to be an option I can make that change during review.

Thanks all!