Bar soft wraps (wide unicode chars are tricky, the birth of the Cell Architecture)

igoro00 commented 4 years ago

As in the title: when terminal window is too narrow progress bar stops animating and it prints itself line by line.

I'm using code from README:

with alive_bar(len(text)) as bar:
    for i in range(len(text)):
        #do some stuff
        bar(text="xd", incr=1)

Here's output: 2019-12-08-14-08-43

rsalmei commented 4 years ago

Hey @igoro00, thank you.

Unfortunately it is well known, but I'm not sure how to fix that. I'm going to study how could I extract the current terminal width, and if the bar would use more than that, do one of two things:

truncate the output
shorten the bar length

But I can not guarantee it can be done, only that I'll try... 👍

igoro00 commented 4 years ago

ok, so i found something simple that would work, but only for versions >= 3.3 (python2 is EOL either way, i.e. next version of Fedora won't ship with python2 at all)

i haven't tested it yet because I don't have access to my pc rn. Maybe tomorrow.

from sys import version_info
if version_info>=(3,3):
    from os import get_terminal_size
    ts = get_terminal_size()
    print(ts.lines) #prints int of lines
    print(ts.columns) #prints int of columns

igoro00 commented 4 years ago

it works on pydroid3 with python 3.7.2

Also get_terminal_size() gets current terminal size so if you put it inside a loop and resize the window it'll update properly.

InfernalPlank commented 4 years ago

@igoro00 how would you encapsulate the loop for the alive_bar inside of another loop to dynamically update the column count as needed? I'm working with something that takes two inputs and scans between them. 0-100 will work fine, but 0-2500 will cause the "running down the screen bar" issue that you're having. I'm looking to use the terminal columns counter to dynamically scale the bar depending on how big the values passed are to avoid this issue.

You can see a snippet of my code in Issue #30!

rsalmei commented 4 years ago

As I've said in the other issue, you can counter this behaviour with a simple argument length=N, N being less than 40, which is the default size. Something like length=30 or even length=20 should give you plenty of space to put some text in the bar. I think this should greatly mitigate this problem, while there isn't support to screen width auto detect.

rsalmei commented 4 years ago

Hello,

I've been trying to implement this in the new, unreleased 2.0 version, but it is way trickier than anticipated.

I wrote just now a small code that should show what to expect, could you try it and report back?

from itertools import cycle
from shutil import get_terminal_size

asd = cycle('12345')
while 1:
    x = get_terminal_size().columns
    print('\033[2K\r' + (next(asd) * x) + '\r', end='')
    time.sleep(1/60.)

In my experience, there's absolutely no way to keep the terminal from garbaging the screen. Experiment the effects, both slowly decreasing the terminal and slowly increasing it... What do you think?

MartinHammarstedt commented 4 years ago

That indeed produces garbage when narrowing the terminal window. Maybe the resizing problem is impossible to solve, but currently alive-progress produces garbage even with a fixed terminal size, if it happens to be too narrow. Truncating the output to the terminal width would at least solve that problem.

rsalmei commented 4 years ago

Humm, you're right. It really would be better to try to start with a solution that gets the terminal size only at init time, not inside the running thread, and get to resolve just that. Gonna work on that 👍

Unfortunately it should come only in the v2, which supports python 3 only.

MartinHammarstedt commented 4 years ago

Great! However I think checking the size continuously might still have its uses. Resizing the terminal would still produce garbage, but that garbage would only be produced while resizing. Without continuous checking the garbage would continue to be produced forever.

Possibly the most important advantage of continuous checking would be when you've started a process and realize that the terminal window is too narrow to read all the messages, so you make the window bigger. If we're only checking the size once, the messages would then continue to be truncated, not using the new space available.

rsalmei commented 4 years ago

You're right again, excellent insights! Indeed it would be helpful to continuously check the terminal, even with garbage being unavoidable...

Actually this is somewhat tricky to implement, as the bar and spinner precalculate several positions and strings to be able to animate that fast, and any change in screen requires recreating them. So they need to be created inside an inner function, not as local variables, and correctly lock the animation thread while doing that, since it is using those variables...

Fortunately, I'm at this moment implementing another feature that will also require recreating them! In v2 I'm creating the concept of entering the bar context in a "precalculate mode", so one can calculate the needed total to send to the very same bar, and get an elapsed time and spinner while doing that. Then it will be possible to restart the running bar with the actual total, resetting the timer and getting an accurate throughput. Anyway, I can use the same reconfiguration ability for this.

rsalmei commented 4 years ago

Oh, forgot to say, very important to grasp my comment: I'm not only truncating the output, I'm actually adapting the bar to the available space. So, the bar will be the most elastic one, which gets the remaining space. There's now an actual title, that appears on the left, fixed, and situational messages on the right, fixed. If the space is too narrow even with the smallest bar, then I'll truncate. I'm also considering putting some scrolling spinners to good use! If the title or the messages happen to be too big, I could put a scrolling signboard to roll them...

rsalmei commented 4 years ago

Hello @MartinHammarstedt, how are you?

Good news, this is ready and working! 🎉 After realizing that v2 was getting way too big, I've decided to split this implementation in two: now we'll get a static soft wrap behavior right in this major (so including python 2 support), and someday that adaptive one I've talked about.

This new version will include:

soft wrapping support
hiding cursor support
logging support
exponential smoothing of ETA data
proper bar title
enhanced elapsed time representation

Wow, that will make a very mature and feature complete progress bar for python 2, ready to expand for python 3 only.

rsalmei commented 4 years ago

Here it is: wrap_support

MartinHammarstedt commented 4 years ago

That's great news, thanks for letting me know! I'm looking forward to the release!

rsalmei commented 4 years ago

The technical part of the PR is ready, gonna update its documentation now! ==> https://github.com/rsalmei/alive-progress/pull/46

rsalmei commented 4 years ago

It's released!! 🥳 Care to test it and share, please?

MartinHammarstedt commented 4 years ago

Nice work! It seems to work well, aside from one thing. The width of the output is calculated using len(), which only counts characters and not the space they actually take up on the screen. Wide Unicode characters like "ＷＩＤＥ" or emojis like "😺" often take up two columns, and as a result the problem with wrapping persists if you use such characters in the title or status message.

I have no experience with this, but maybe you could use unicodedata.east_asian_width() or the wcwidth library to help calculating actual screen width used.

unicodedata.east_asian_width("😺") -> 'W' (as in "wide") wcwidth.wcswidth("😺") -> 2

In my case it's a single emoji that causes the trouble, but I found I can work around the problem by adding a zero width space character as well, as it won't take up any space on the screen but will be counted by len(), making the length match the space used. Not a very nice solution, but it works!

rsalmei commented 4 years ago

Thanks man! Wow, unicode is hard... I didn't know about those wide characters! But thank you about both the analysis and your possible solution, a zero width space could work for the general case too, if I manage to insert those automatically, matching the number of wide chars... I'm going to try that, and get back to you! 👍

rsalmei commented 4 years ago

Hey @MartinHammarstedt, it's a little trickier than anticipated.

I've counted those characters known as Wide in both title and text. Then discounted its sum from the len(), which should be enough to not use any ZWSP. But when the line is truncated to the needed length, part of these characters may be stripped off the string...... Argh, so that sum is no longer valid...

Well, I can iterate the strings and find out how many characters I can include in the needed length, but I would need to do that in all refreshes, which I definitely don't want to. So, I think ZWSP is not a bad solution, at all! If I include them right before all of those characters, I could do this operation just once! It has to be right before each one of them because that would also make work the word boundaries, where one more character needed could imply in 2 more being added, that way it's the ZWSP that will be out, which may not align things properly, but at least will not extrapolate the screen.

Which char did you use?

zero-width space (ZWSP): U+200B
zero-width non-joiner (ZWNJ): U+200C
zero-width joiner (ZWJ): U+200D

I'm asking because in my macOS iTerm2 terminal, the ZWSP seems to be the only one that does not work... 🤔

rsalmei commented 4 years ago

Hey, you didn't answered yet, but I have good news, I've made it work! 🎉 But there's a few quirks, look at that!

The common case, the one you actually reported, worked from 3.6 onward:

The width is indeed 'W', and my new sanitize_text_marking_wide_chars does:

And all works as expected in the terminal, the line does not wrap!

Curiously, in 2.7 unicodedata returns it wrong:

So my sanitize doesn't work, but magically it does work (the line is not wrapped)!

Yeah, the unicode version 8.0 in python 2.7 appear to encode that with 2 bytes already.

But in 3.5 specifically:

My sanitize doesn't work either, and the unicode version 9.0 encode it differently:

So the line wraps, and I can do nothing about it... 😞

Tricky humm?

rsalmei commented 4 years ago

Hey I just released it!! Can you test it again?

MartinHammarstedt commented 4 years ago

I just tried it and it seems to working great (in Python 3.7.5 at least)! However, in sanitize_text_marking_wide_chars you should check not only for "W", but also "F" and "A". "F" is "Fullwidth" (like these characters: "ＷＩＤＥ") and "A" is "Ambiguous". I don't know what the difference is between Wide and Fullwidth, but both take up two columns in my terminal, and Ambiguous are characters that may be wide, so it's safest to count them as wide.

As for which character I used in my workaround, I used the zero-width space. All three characters you listed work identically in my Ubuntu terminal, so the zero-width joiner you're using works just as well.

rsalmei commented 4 years ago

I'm glad it worked! It was harder than I initially thought... 😅

But I'm not sure about the 'F' and 'A'... I know the latter seems to not be treated as wide. According to Unicode East Asian Width - Section 5 Recommendations:

When processing or displaying data

Wide characters behave like ideographs in important ways, such as layout. Except for certain punctuation characters, they are not rotated when appearing in vertical text runs. In fixed-pitch fonts, they take up one Em of space.

Halfwidth characters behave like ideographs in some ways, however, they are rotated like narrow characters when appearing in vertical text runs. In fixed-pitch fonts, they take up 1/2 Em of space.

Narrow characters behave like Western characters, for example, in line breaking. They are rotated sideways, when appearing in vertical text. In fixed-pitch East Asian fonts, they take up 1/2 Em of space, but in rendering, a non-East Asian, proportional font is often substituted.

Ambiguous characters behave like wide or narrow characters depending on the context (language tag, script identification, associated font, source of data, or explicit markup; all can provide the context). If the context cannot be established reliably, they should be treated as narrow characters by default.

Emoji style standardized variation sequences behave as though they were East Asian Wide, regardless of their assigned East_Asian_Width property value.

Note the 'F' Fullwidth is not even cited in this text. And in fact, according to the images I've posted about Python 2.7 above, if I include Fullwidth I'll break it in Python 2.7, making a string of length 3 instead of 2 (it already had that length in the old unicode ucd 8.0). So I can't do that.

EDIT: In Python 3.5, that double char is 'N', or Neutral... 😕

That is beyond tricky... 😓

rsalmei commented 4 years ago

Well, I've decided to include 'F' indeed, but only in the next major, which will drop python 2.7 support. That way I can maintain its compatibility, without losing much in the other versions. Thank you for all your insights again!

MartinHammarstedt commented 4 years ago

Sounds like a good decision! 👍

rsalmei commented 4 years ago

Nice, I hope it's working nicely now, with no little hacks whatsoever!

rsalmei commented 4 years ago

Hey @MartinHammarstedt, how are you?

Fun fact, guess what I'm struggling now to release #51? Yes, wide unicode chars........ 😂 I realized I included support for the title and situational message, which is nice, but forgot about the spinners! If you create a customized one and use the widies, it will all break again.... 😓

And now it is a little more complex, if you can believe that... See: titles are fixed, and messages are truncated only, but spinners have animations, which can make these characters go in and out of the display area!

frame spinners can have frames with or without them, so I need to equalize their lengths;
scrolling spinners need to make them smoothly enter and exit the screen, so I need to place actual spaces in place if the ZWJ would be displayed or if it has just left the screen, both at either side.....

The scrolling is hard to implement efficiently, the frames are done! asd

Regards! 👍

rsalmei commented 4 years ago

Yeah, after I've made the frame spinners work, it took 25 more days to make all the others work too! 😅 Well, I'm going to keep this open for now, to see what will enter in the next milestone.

rsalmei commented 4 years ago

Hey, another chapter in the wide chars saga... 😞 I just found Variation Selectors! https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)

It seems some characters already have length=2 in python, the ones that have a VS immediately following them... This is for characters that have multiple renditions, like a text and a glyph one. For those, I think I should maintain the VS, but this would break everything I did until now, argh.... It requires a different technique to detect and fix truncated strings with wide chars inside.

To use both methods at the same time would be cumbersome and tricky, but how to unify them? If there isn't any VS, could I insert a "default" one? Wow...

MartinHammarstedt commented 4 years ago

Great work on all this 👍, and I'm sorry that Unicode keeps finding new ways to make life hard for you! I'm afraid I don't have any great advice to give. I was going to suggest that you have a look at the wcwidth library (whose sole purpose is calculating widths of Unicode strings in the terminal) to see how they tackled the problem, but then I noticed that they don't seem to support Variation Selectors either. So at least you're not alone! 😅

rsalmei commented 4 years ago

Yeah, thank you @MartinHammarstedt.

I already implemented these three variant forms (spec):

# unicode variant forms, this is very tricky, and in no way comprehensive.
VARIATION_SELECTORS = tuple(chr(0xfe00 + i) for i in range(16))  # unicode variation selectors.
VARIATION_SELECTORS_SUPPLEMENT = tuple(chr(0xe0100 + i) for i in range(240))
EMOJI_MODIFIER_FITZPATRICK = tuple(chr(0x1f3fb + i) for i in range(5))  # unicode skin colors.

And I came up with a third method, instead of using 1. Zero Width chars before widies and 2. default variant forms after them when missing... I've changed the ZWJs insertion to after the widies, only insert them if there's no known variation following it, reversed my fix truncation algorithm to detect it after the chars, included all the variations above in that detection, and finally created a new iter_wide algorithm that delivers "complete chars", including their variations... 😅 So all currently known wide chars are marked, with its original variant form or a ZWJ! It seems to be working nicely!

I may end up with better support for Unicode then wcwidth do! 😝

rsalmei commented 4 years ago

😂 My iter_wide is broken... Just found out the Fitz Patrick skin modifiers can also be used as chars, WIDE chars...... So I must, at the same time, look at both the preceding and the following chars... argh

rsalmei commented 4 years ago

Ok, it is working, now I do support wide chars and Fitz Patrick skin tones variation, it was hard to support it, came up with the following combinations table:

The bad news is: excited, I tried to include a few spinners with a variety of cool emojis, like these ones:

    flora = bouncing_spinner_factory('☘️🌱🌲🌳🌴🌵🌾🌿🍀🍁🍂', 8, 2)
    balls = scrolling_spinner_factory('🏀🏈🏉🏐🏓⚾️🥎⚽️🧶🎾🎱', 3, 1)
    fruits = bouncing_spinner_factory('🍎🍍🍌🍋🍊🍉🍈🍇🍅🍓🍒🍑🍐🍏', 10, 3)
    flowers = bouncing_spinner_factory('💐🌷🌸🌹🌺🌻🌼', block=3)
    elements = bouncing_spinner_factory('🔥🌊💨️⚡️', block=1)

When the spinner engine broke wildly with: ValueError: invalid string with unicode modifiers. Humm... Then I've discovered... I was only scratching the surface of Unicode!!! 😞 The very first character "☘️" is formed by:

In [24]: [(x, hex(ord(x)), unicodedata.east_asian_width(x)) for x in '☘️']
Out[24]: [('☘', '0x2618', 'N'), ('️', '0xfe0f', 'A')]

It's two characters, so wide in any sense, but has east asian width Neutral... So I cannot trust this info.

Then I finally discovered in this article that the size a character occupies on screen has NOTHING to do with their encoding!! There's even "two women kissing" that's formed by an astounding combination!!

In [17]: [(x, hex(ord(x)), unicodedata.east_asian_width(x)) for x in '👩‍❤️‍💋‍👩']
Out[17]:
[('👩', '0x1f469', 'W'),
 ('\u200d', '0x200d', 'N'),
 ('❤', '0x2764', 'N'),
 ('️', '0xfe0f', 'A'),
 ('\u200d', '0x200d', 'N'),
 ('💋', '0x1f48b', 'W'),
 ('\u200d', '0x200d', 'N'),
 ('👩', '0x1f469', 'W')]

That's 8 chars on string, to occupy probably two on screen. This article is also very informative, although so long I couldn't read it all.

First I've discovered the WIDE chars, those ones that occupies two cells on screen, even having length one in Python. => Implemented. Here came the spinner compiler. Then the "Variation Forms" has appeared, those chars that have a text and glyph representation. => Implemented. Then came the "Fitz Patrick variations", the skin tone modification that can also be used as wide chars. => Implemented. Now there's the Neutral chars with a variation, which should be interpreted as Wide, AND the ZWJ glue sequences that can generate arbitrarily long sequences to be interpreted as only one char... 🤦

I'm very disappointed. I'm now on the forth refactoring of the spinner engine to support these chars. This has brought very very cool additions, like the spinner compiler, but it seems to be infinite. The architecture I came up till now is based on "chars with dual cell on screen are adjusted to have length two in Python", but this discovery breaks it badly. I now need to support arbitrarily long sequences, which I cannot possibly make have length=2 in Python... argh. I do like challenges, but come on Unicode! 😖

rsalmei commented 4 years ago

Hey @MartinHammarstedt,

Another hit on a wall. I just found that I can't insert ZWJ to adjust the size of the text... Well I already couldn't when I found out about the grapheme clusters, that could take way more chars than two in sequence, but I'm talking now about the initial ones, those alone, that have unicode east asian width of Wide or Fullwidth! These chars can be used alone or in group, meaning different things...

Example: 1f9d1 -> person [🧑] 1f393 -> graduation cap [🎓] 1f9d1 200d 1f393 -> student [🧑‍🎓]

Yeah... A ZWJ arbitrarily put inside the string can alter its meaning....... It doesn't work.

rsalmei commented 4 years ago

Hey, finally some great news! After some major refactors, again, I've created the infrastructure to support grapheme clusters!! Actually support them... 😅 Now the spinners will not deal internally with strings anymore, but with tuples of "columns", that's the big change! And the original sequence are never modified in any way, not even on screen!! When one char is presumed to use two columns, it is marked with a None in sequence, which is removed when rendered on screen, but do impact len() and makes it easier to detect truncated chars! Well it's only the infrastructure for now, but it's more than promising, the simplest one frame_spinner_factory has just worked!

The check() function has gained the ability to display codepoints! Now we can see why it is so difficult, note that a string with length 6 in Python can use the exact same space on screen than another string with length 16!! In orange we can see the codepoints for a char I've considered WIDE (which uses 2 columns), and in blue for the single ones (1 column). My great breakthrough here is understanding grapheme clusters, and defining that all frames of an animation must have the exact same "length on screen", not on Python and not even on graphemes!

string	python	graphemes	columns
okokok	6	6	6
🏴󠁧󠁢󠁥󠁮󠁧󠁿👉🏾🏴󠁧󠁢󠁥󠁮󠁧󠁿	16	3	6

rsalmei commented 4 years ago

Scrolling is working too! 🎉

rsalmei commented 4 years ago

Ok, I'm on a roll!

rsalmei commented 4 years ago

Testing with block mode is another thing entirely, where all graphemes should scroll isolated. Here I have to remove the column marks of wide chars, iterate on all of them reapplying the marks, multiply by the block size, truncate the resulting string, and finally fixing any truncated wide char at the end... Yeah, working! 🎉

MartinHammarstedt commented 4 years ago

Hey, great work! 😄 It's good to see that you haven't given up, despite all the surprises Unicode keeps throwing at you!

rsalmei commented 4 years ago

Thank you man! But actually I've decided to stop trying to detect the unicode graphemes on my own, which is kind of giving up... I realized I would be much better served by a dependency for managing that. 😅

I've seen that I was investing so much time in something that 1. was not the core of my project, 2. changes every year, 3. could be nicely provided by the excellent python dependency mechanism, and most importantly 4. was keeping me from refactoring all the other components that needed to learn how to deal with streams of graphemes once that did work... And that is the core of my project!!

rsalmei commented 4 years ago

ALL the spinners are WORKING!!! Now alongside_spinner_factory and delayed_spinner_factory 🎉🥳 I think this is one of the most advanced pieces of code I've ever written! That was a great challenge! (look for the spinner_compiler once I commit it...)

rsalmei commented 4 years ago

For history sake, this is what it was when I started this endeavor:

It has gone a long way... Now I'll start to include this support in the bars subsystem, and then make the progress bar work again, using the new stream of graphemes architecture... 👍

rsalmei commented 4 years ago

Hey @MartinHammarstedt, how are you man?

Well, I've been studying the basic problem today, and I think I found a way to not produce garbage on screen! I've used Save Current Cursor Position and its Restore, to a very nice effect! On normal movement the screen keeps almost pristine! Save for a few blank lines that appear on top. On fast movements sometimes a garbage does appear, but it is very minimal.

How cool is that? asd

This is the code if you want to try:

from itertools import cycle
from shutil import get_terminal_size
import time

asd = cycle('12345')
print('\033[s')  # <-- saves the cursor position
while 1:
    x = get_terminal_size().columns
    # clear till the end of screen, and restores cursor
    print('\033[J\033[u' + (next(asd) * x) + '\r', end='')
    time.sleep(1/60.)

I'm just a little worried, would it cause any flicker on some terminal? I've always tried to minimize any flickering, since I update the bar at high refresh rates. So I store the length of the last line generated, and only clear the screen when the next line is smaller than that (the longer ones override it anyway). Well, with this solution, I do clear the screen at each refresh. Is it possible it could flicker now? My terminal on macOS is clearly double buffered, so I can't see any blinking.

rsalmei commented 4 years ago

Oh, and a quick update! The alive-progress is working again! The new "cells and graphemes" architecture is a success! No more strings for frames, just tuples of cells! No more runtime generation of animations, they're compiled ahead of time! \o/ Now it's just the bar generation, stay tuned...

MartinHammarstedt commented 4 years ago

Interesting find! I tried your code sample and it works as intended on Ubuntu. No flickering and no matter how fast I resize the window I can't get it to produce garbage.

I also tried it on Windows 10 with mixed results. Using the newer Windows Terminal (which sadly doesn't come preinstalled) it worked fine with no flickering, but some garbage when making the window smaller. Not a big problem. However, when using the good old Command Prompt (cmd.exe), which I think is still the default, it didn't work at all. It doesn't support ANSI escape codes, so the "\033[J\033[u" gets printed along with the rest and the screen fills up with lines immediately. That would be a problem if you're aiming to keep the current cross-platform compatibility.

rsalmei commented 4 years ago

Interesting find! I tried your code sample and it works as intended on Ubuntu. No flickering and no matter how fast I resize the window I can't get it to produce garbage.

Hey, that's very nice! Thank you for trying it, and I'm glad it worked so nicely!

But I think I'll let that for another time. I've tried several ways of expanding this to more simultaneous lines, to also support the exhibits like showtime, but couldn't find one that worked. Also, I know there's some unofficial changes that support multiple alive-progress'es on screen at the same time (there's an issue for that), and if I include a Save/Restore cursor, I'll surely break it for them. It's never easy...

rsalmei commented 4 years ago

Wow, the bar engine is finally working!! 😅 🎉 Now with complete support for Emojis and exotic Unicode chars! Also, support for borders, tips and errors of any length, and underflow error that can leap to border if it can't fit!

Normal fill

Transparent fill

Happy Halloween!

rsalmei commented 4 years ago

The spinner compiler is finally committed, if someone would like to see it: here. Also the new spinners and bars implementations: here and here. \o/

TheTechRobo commented 3 years ago

If you're still having issues with the escape codes on Windows CMD, try using the colorama package - it doesn't do anything on linux/mac/others but if on Windows it replaces the escape codes with WinAPI sys calls.

https://pypi.org/project/colorama/#usage

rsalmei commented 3 years ago

Cool tip @TheTechRobo, thanks.

rsalmei / alive-progress

Bar soft wraps (wide unicode chars are tricky, the birth of the Cell Architecture) #19