mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.74k stars 960 forks source link

Use `sys.stdout.flush()` in `skip` and `success` #2529

Closed AlttiRi closed 1 year ago

AlttiRi commented 2 years ago

On Windows with colored output text enabled ("output": {"mode": "color"}) the buffering is abnormally aggressive. In some cases I can see the output in the console only after the program end (when all downloads are skipped).

The simple fix is using of

sys.stdout.flush()

in skip (and in success, just in case) methods of ColorOutput class. https://github.com/mikf/gallery-dl/blob/d85e66bcacc2df5918024e9313caa6011d8a1d77/gallery_dl/output.py#L318-L323


Most likely because of this also Ctrl+C works very bad.

There are no problems without colored text enabled, btw. (When it uses "#" prefix for the skipped downloads instead of gray coloring.)

AlttiRi commented 2 years ago

The bug is still exists with Git-Bash without enabled Enable experimental support for pseudo consoles option. (when MSYS=disable_pcon is instead of MSYS=enable_pcon in /etc/git-bash.config file)

For example, download https://www.hentai-foundry.com/pictures/user/personalami/scraps (3 images), then run it again, 3 lines will be shown only after the program end, while whey should be showed one by one.

With sys.stdout.flush() in skip it works fine.

mikf commented 2 years ago

What's the output of python -c "import sys; print(f'{sys.stdout.line_buffering = }\n{sys.stdout.isatty() = }')" for you?

What you are describing here and have been in other issues sounds like Python doesn't recognize your terminal as a regular TTY and does not enable line buffering by default, among other things.

$ python -c "import sys; print(f'{sys.stdout.line_buffering = }\n{sys.stdout.isatty() = }')"
sys.stdout.line_buffering = True
sys.stdout.isatty() = True
AlttiRi commented 2 years ago

With the default Git-Bash settings (so without Enable experimental support for pseudo consoles option):

echo "MSYS=disable_pcon" > /etc/git-bash.config

python -c "import sys; print(f'{sys.stdout.line_buffering = }\n{sys.stdout.isatty() = }')" prints:

sys.stdout.line_buffering = False
sys.stdout.isatty() = False

In this case flush is required.

With echo "MSYS=enable_pcon" > /etc/git-bash.config console log works well without flush. However, this option is not default in Git-Bash yet.

Git-Bash uses MinTTY.

mikf commented 2 years ago

All writes to stdout now get explicitly flushed (https://github.com/mikf/gallery-dl/commit/eeef9ccdc1813787c5e1327ae2e376e45852c562)

This would not be necessary if Python would properly detect Git-Bash/MinTTY with its default settings and enable sys.stdout.line_buffering. flush() now gets called twice for TTYs that have line_buffering enabled.

Not the end of the world, but I would've liked to find a solution that calls flush() only once in all cases. Maybe I should just always disable ´line_buffering` and only do explicit flushes, even outside of output.py?

AlttiRi commented 2 years ago

Something like Strategy pattern

const isFlushRequired = // ...
function getStdWrite() {
  if (isFlushRequired) {
    return text => {
      process.stdout.write(text);
      process.stdout.flush();
    };
  }
  return text => {
    process.stdout.write(text);
  };
}

Or just

function stdWrite(text) {
  process.stdout.write(text);
  if (isFlushRequired) {
    process.stdout.flush();
  }
}
God-damnit-all commented 2 years ago

The ANSI color codes don't seem to work in PowerShell. Maybe this will fix it?

https://stackoverflow.com/a/72292134

If not, maybe this (lines 6-8):

https://gist.github.com/RDCH106/6562cc7136b30a5c59628501d87906f7

God-damnit-all commented 2 years ago

Turns out, all you need to fix the color output in PowerShell is to just include these two lines in __init__.py:

import os
os.system('')

Found that out from here: https://stackoverflow.com/a/64222858

God-damnit-all commented 2 years ago

@mikf In addition to the change above, could you add an output setting that would allow for * and # to still be used to indicate success/skip in color mode? I want to start using PowerShell's Start-Transcript command for logging (since I use gdl as part of a script that logs other things) and sadly Start-Transcript doesn't properly log ANSI.

My proposal is output.prepend and it's set to this by default: {"success": "* ", "skip": "# ", "force": false} and to make it appear in color mode, you set force to true, while allowing users to customize what they want for indicating success/skip so that the setting doesn't seem so out-of-place.

God-damnit-all commented 2 years ago

Apparently OSes other than Windows uses a checkmark for success, so maybe this instead: {"success": true, "skip": true, "force": false}

If success/skip is set to a string, it prepends that instead, unless color mode is active and force is set to its default of false. If success/skip is set to false, then it wouldn't prepend anything even if color mode is active. If force is set to true, then prepends would function in color mode.

mikf commented 2 years ago

@ImportTaste instead of implementing all sort of extra options, https://github.com/mikf/gallery-dl/commit/7990fe84f11271bc8e4079db6b0248dbeb79474a adds a way to more or less completely customize the standard output and download progress bar.

No explicit documentation for now, since I'm not sure if I want to leave it like this (suggestions welcome), but the commit message example is hopefully good enough to get the general idea across.

For the download progress format strings:

AlttiRi commented 2 years ago

Something wrong with console width detection ~now~: Screenshot image

Hm, the old exe files work the same. Maybe I did note it before because I use 180 columns console width by default. ~However, it worked before, it seems for me, but I can't reproduce it now with the old versions, it's strange.~ Anyway, currently it works incorrectly.

    "output": {
        "mode": "color",
        "colors": {
            "success": "1;32",
            "skip"   : "38;5;245"
        },
        "progress": true,
        "shorten": true,
        "private": true,
        "log": { 
            "level": "info",
            "format": {
                "debug"  : "\u001b[0;37m{name}: {message}\u001b[0m",
                "info"   : "\u001b[1;37m{name}: {message}\u001b[0m",
                "warning": "\u001b[1;33m{name}: {message}\u001b[0m",
                "error"  : "\u001b[1;31m{name}: {message}\u001b[0m"
            }
        }
    },
mikf commented 2 years ago

The problem is that with "shorten": true, gallery-dl assumes all characters to have a width of 1. Use "shorten": "eaw" (East Asian Width) to make it work with characters with a width of 2 columns.

This is not the default because it produces wrong results if the appropriate fonts aren't available and all/some characters are displayed as ? or , and the algorithm is at least an order of magnitude slower.

AlttiRi commented 2 years ago

Hm, it works. I thought I already used monospaced font.

For example, I usually use "" as a separator for keys. In the screenshots above it is not long.

Ah, wait, yeah, the hieroglyphs take 2 chars width. On the first line there are 10 hieroglyphs so 10 chars were moved to the second line.

Hrxn commented 2 years ago

Hm, it works. I thought I already used monospaced font.

For example, I usually use "" as a separator for keys. In the screenshots above it is not long.

Ah, wait, yeah, the hieroglyphs take 2 chars width. On the first line there are 10 hieroglyphs so 10 chars were moved to the second line.

nb. this has nothing to do with monospaced fonts. Monospace here means simply the glyphs all have the same horizontal width, this applies only to display/rendering, it's not related to the "technical" width of a character (or a string) of 1 (as in one byte), that is the encoding, not displaying.