Closed colinmford closed 4 years ago
Update:
Instead of using regex to remove the first tab, instead the text is split on this regex:
r"[\n|\r|\r\n]\t*"
Should match cross-platform newlines, plus a tab character if it's present. Results in a nice clean list:
NEWLINE_RE = re.compile(r"[\n|\r|\r\n]\t*")
text = "List1\n\tList2\n\tList3"
NEWLINE_RE.split(text)
>>> ("List1", "List2", "List3")
It replaces a .splitlines()
function, however, which technically matches all of these below. I thought it was unlikely, but if we need to support any or all of the additional characters we could add them to the regex.
Representation | Description |
---|---|
\n | Line Feed |
\r | Carriage Return |
\r\n | Carriage Return + Line Feed |
\v or \x0b | Line Tabulation |
\f or \x0c | Form Feed |
\x1c | File Separator |
\x1d | Group Separator |
\x1e | Record Separator |
\x85 | Next Line (C1 Control Code) |
\u2028 | Line Separator |
\u2029 | Paragraph Separator |
Update based on @alerque suggestion:
Removes Regex and uses textwrap.dedent()
to fix the multiple indention issue, but breaks this test
I made some revisions, following the advice that @benkiel made here. Now all tests are passing.
I added a dedent_tabs
method that works like textwrap.dedent
, but it will only dedent tabs and 4-space indentions.
https://github.com/colinmford/ufoNormalizer/blob/440f6a53f1e30be597393f76967f8ac1e528da27/src/ufonormalizer.py#L1367-L1416
This and a few extra lines of stripping allow all the tests to pass: https://github.com/colinmford/ufoNormalizer/blob/440f6a53f1e30be597393f76967f8ac1e528da27/src/ufonormalizer.py#L1179-L1182
Finally, added additional tests to test for the case @benkiel brought up and other related cases: https://github.com/colinmford/ufoNormalizer/blob/440f6a53f1e30be597393f76967f8ac1e528da27/tests/test_ufonormalizer.py#L848-L884
Thanks @colinmford!
Thanks @benkiel and @moyogo!
This should fix #71 if approved.
It uses regex to remove the first tab in each line of a multi-paragraph text block.
Test added to make sure
does not change when it goes through
_normalizeGlifNote
into something likeetc.