Closed colinmford closed 4 years ago
I think I tracked it down to XMLWriter.text()
and XMLWriter.raw()
https://github.com/unified-font-object/ufoNormalizer/blob/5b67c7aa760ef66748badc5f208a7d1aab9a0c69/src/ufonormalizer.py#L1180-L1199
https://github.com/unified-font-object/ufoNormalizer/blob/5b67c7aa760ef66748badc5f208a7d1aab9a0c69/src/ufonormalizer.py#L1170-L1174
if the text starts out as something like this
<note>
Line1
Line2
Line3
</note>
it enters the XMLWriter.text()
like this
"\n\tLine1\n\tLine2\n\tLine3\n"
it's immediately stripped on line 1181, making it
"Line1\n\tLine2\n\tLine3"
then, it's split into lines on line 1184, making the lines
("Line1", "\tLine2", "\tLine3")
At the end of the function, on line 1199, each line goes to XMLWriter.raw()
and gets re-indented on line 1172. Making the self._lines
list this:
["\tLine1", "\t\tLine2", "\t\tLine3"]
So, in #72 I use re.split()
to split the text on this regex: r"[\n|\r|\r\n]\t*"
instead of using .splitlines()
on line 1184. It results in a nice clean list:
("Line1", "Line2", "Line3")
If there's a better way I would be open to trying again, but I think we can agree that the current behavior is not correct.
Why is the case of one tab being treated specially here? Why is whitespace internal to a free-form content string being munged at all? If this is to normalize indentation shouldn't all strings be parsed to find the shortest whitespace prefix first and then replace all occurrences of that whitespace string at the start of each line with the desired normalized indent?
@alerque This test appears to making sure that white space isn't removed here, and I didn't want to contradict it.
If the decision is made that this is no longer expected behavior, then I would be totally ok with removing all whitespace prefixes. As a first-time contributor I didn't want to make that assumption though.
@alerque Ah, ok I think I take your meaning.
textwrap.dedent()
does what you're talking about, and fixes my issue, but it seems to interfere with the test I posted above... (there's 3 spaces* at the start of Line1, and 4 at the start of Line3, so it leaves Line3 with an indent of 1 space, instead of all 4 in the test):
textwrap.dedent(" Line1 \t\n\n Line3\t ")
>>> 'Line1 \t\n\n Line3\t '
Maybe that test needs to be reconsidered? Or we might need a custom function based on dedent
.
Since @moyogo wrote that test (according to Blame), maybe he can weigh in on what that test is specifically testing for?
This was checking the behaviour of the ufoNormalizer at that time. If the behaviour changes, so should the test.
@moyogo I don't really want to change other Normalizer behavior so I would like to keep all tests as-is as much as possible, but if you think that test is outdated we should discuss how it should change... but to rephrase my question:
I would really like to come up with an answer to this issue that works for everyone!
@colinmford The behaviour of the normalizer needs to change since it changes notes that have already been normalized. I think the difference in the number of space was supposed to replicate the case where a user adds 3 spaces at the beginning of their note.
My opinion is that it should keep those three spaces, as they may have semantic meaning to the designer, so I would do the least invasive thing —that you did first— or managing the extra tabs, I think that's the right solution to this, as it's a narrow, specific issue. Does that make sense @moyogo?
@colinmford I would just test for the case of a line with a user added tab, so something like:
<note>
Line1
Line2
Line3
</note>
The normalizer should keep the tab of Line2
Hi @benkiel @moyogo
I updated the method and now all tests are passing (including the additional test case you mentioned @benkiel, and a couple other related ones).
More info in #72.
Let me know if you think it's suitable!
Hi @benkiel @moyogo please don't forget to review #72, this is a really annoying issue I would like to fix (or is there someone else that should review it?)
Hi,
We've been noticing that if a glyph has line returns in a note, the Normalizer has some trouble formatting it.
For instance, here is how a note started out after it was added, and its first Normalization:
and with each round of normalization, all the lines after the first get indented an extra level
Once:
Twice:
Thrice:
etc.