Regression in 1.11 in split_text

jabdoa2 commented 5 years ago

Describe the bug split_text no longer concats text. This seems to be a regression from a fix to prevent replacing long strings with "...".

In https://github.com/peterbrittain/asciimatics/blob/master/asciimatics/widgets.py#L227:

    for token in tokens:
        for i, line_token in enumerate(token.split("\n")):
            if string_len(current_line + line_token) > width or i > 0:
                # Don't bother inserting completely blank lines - which should only happen on the very first
                # line (as the rest will inject whitespace/newlines)
                if len(current_line) > 0:
                    result.append(current_line.rstrip())
                current_line = line_token + " "
            else:
current_line += line_token + " "

Essentially, you could remove that block and just use token because it will never concat words.

To Reproduce

print(_split_text("very\nlong\ntext\nvery\nlong\ntext\nvery\nlong\ntext\nvery\nlong\ntext\n", 20, 5))

Will output:

['very', 'long', 'text', 'very', 'long...']

Expected behavior

['very long text very', 'long text very long', 'text very long text']

System details (please complete the following information):

OS and version: any
Asciimatics version: 1.11.0. In 1.10 the function has other bugs.

Additional context Fix for this would be this:

    for token in tokens:
        for i, line_token in enumerate(token.split("\n")):
            if string_len(current_line + line_token) > width and i > 0:
                result.append(current_line.rstrip())
                current_line = line_token + " "
            else:
                current_line += line_token + " "

Notice the "and i > 0" which would otherwise always be true except in the first run which the new introduced condition prevented.

peterbrittain commented 5 years ago

Hmmm... That's not quite right. The intention of this function was always that it would respect newlines in text when reflowing for width. If you use spaces instead of new lines it should concatenate as you expect.

Other people definitely rely on the use of newlines to force line breaks in their UI, so I can't just remove that function. In fact, when I run your example against the 1.10.0 tagged code, I get exactly the same result, so I don't think there is a regression here. Have I missed something?

jabdoa2 commented 5 years ago

Then maybe just remove the for block if this is intended behavior?

peterbrittain commented 5 years ago

The logic is to split any text up into multiple lines, breaking on space and newlines (respecting breaks for the latter). We need the external for loop to break out the words (on spaces) and the internal loop to break out new lines that might be hiding inside those words. I don't think either can be removed.

It would be posible to have a single loop if I used something like a regexp to tokenize and return the matching delimiter, but I didn't want to use that sledgehammer here.

jabdoa2 commented 5 years ago

But why does that loop split on newlines then? I see exactly no inputs which would not trigger the else branch. Except in the first iteration which is an empty string and has the same functionality as the else branch

peterbrittain commented 5 years ago

Consider the case where you split "a b\nc\nd e f". Just splitting on space gives you 4 tokens. The 1st token ("a") is fine and will hit the check and be processed purely on resulting new string length (because i == 0).

However the 2nd is "b\nc\nd". This must be split up. The first time through the inner loop, it will have a line_token of "b", which again will be processed purely on the resulting new string length (because i==0 again). However, when you come through the 2nd and 3rd times on the inner loop (for "c" and "d") it must force a line break no matter what the new string length will be. That's why it needs to split for non-zero values of i.

jabdoa2 commented 5 years ago

Makes sense. Sorry to bother!

peterbrittain / asciimatics

Regression in 1.11 in split_text #211