Attempting to prevent multiple indentions of notes

unified-font-object / ufoNormalizer

A tool that will normalize the XML and other data inside of a UFO.

Other

51 stars 19 forks source link

Attempting to prevent multiple indentions of notes #72

Closed colinmford closed 4 years ago

colinmford commented 4 years ago

This should fix #71 if approved.

It uses regex to remove the first tab in each line of a multi-paragraph text block.

Test added to make sure

<note>
    Line1
    Line2
    Line3
</note>

does not change when it goes through _normalizeGlifNote into something like

<note>
    Line1
        Line2
        Line3
</note>

etc.

colinmford commented 4 years ago

Update:

Instead of using regex to remove the first tab, instead the text is split on this regex:

r"[\n|\r|\r\n]\t*"

Should match cross-platform newlines, plus a tab character if it's present. Results in a nice clean list:

NEWLINE_RE = re.compile(r"[\n|\r|\r\n]\t*")
text = "List1\n\tList2\n\tList3"
NEWLINE_RE.split(text)
>>> ("List1", "List2", "List3")

It replaces a .splitlines() function, however, which technically matches all of these below. I thought it was unlikely, but if we need to support any or all of the additional characters we could add them to the regex.

Representation	Description
\n	Line Feed
\r	Carriage Return
\r\n	Carriage Return + Line Feed
\v or \x0b	Line Tabulation
\f or \x0c	Form Feed
\x1c	File Separator
\x1d	Group Separator
\x1e	Record Separator
\x85	Next Line (C1 Control Code)
\u2028	Line Separator
\u2029	Paragraph Separator

colinmford commented 4 years ago

Update based on @alerque suggestion:

Removes Regex and uses textwrap.dedent() to fix the multiple indention issue, but breaks this test

https://github.com/unified-font-object/ufoNormalizer/blob/5b67c7aa760ef66748badc5f208a7d1aab9a0c69/tests/test_ufonormalizer.py#L831-L837

colinmford commented 4 years ago

I made some revisions, following the advice that @benkiel made here. Now all tests are passing.

I added a dedent_tabs method that works like textwrap.dedent, but it will only dedent tabs and 4-space indentions. https://github.com/colinmford/ufoNormalizer/blob/440f6a53f1e30be597393f76967f8ac1e528da27/src/ufonormalizer.py#L1367-L1416

This and a few extra lines of stripping allow all the tests to pass: https://github.com/colinmford/ufoNormalizer/blob/440f6a53f1e30be597393f76967f8ac1e528da27/src/ufonormalizer.py#L1179-L1182

Finally, added additional tests to test for the case @benkiel brought up and other related cases: https://github.com/colinmford/ufoNormalizer/blob/440f6a53f1e30be597393f76967f8ac1e528da27/tests/test_ufonormalizer.py#L848-L884

moyogo commented 4 years ago

Thanks @colinmford!

colinmford commented 4 years ago

Thanks @benkiel and @moyogo!