scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
51.16k stars 10.35k forks source link

enable ANSI color (instead of ANSI color codes) in the Windows terminal #4393 #4403

Closed akshaysharmajs closed 3 years ago

akshaysharmajs commented 4 years ago

modified scrapy/utils/display.py to stop ANSI color sequences in the Windows terminal (issue scrapy#4393)

codecov[bot] commented 4 years ago

Codecov Report

Merging #4403 into master will increase coverage by 0.34%. The diff coverage is 86.66%.

@@            Coverage Diff             @@
##           master    #4403      +/-   ##
==========================================
+ Coverage   84.90%   85.25%   +0.34%     
==========================================
  Files         162      162              
  Lines        9748     9760      +12     
  Branches     1437     1439       +2     
==========================================
+ Hits         8277     8321      +44     
+ Misses       1211     1178      -33     
- Partials      260      261       +1     
Impacted Files Coverage Δ
scrapy/utils/display.py 92.85% <86.66%> (+61.60%) :arrow_up:
scrapy/core/downloader/contextfactory.py 90.00% <0.00%> (-6.67%) :arrow_down:
scrapy/utils/trackref.py 82.85% <0.00%> (-2.86%) :arrow_down:
scrapy/utils/spider.py 77.77% <0.00%> (+11.11%) :arrow_up:
scrapy/robotstxt.py 97.53% <0.00%> (+22.22%) :arrow_up:
scrapy/utils/py36.py 100.00% <0.00%> (+80.00%) :arrow_up:
wRAR commented 4 years ago

Ideally the colorization should stay enabled on the terminals that actually support it, such as the ones from WSL or MSYS, but I suspect we need to check terminfo for that which is a lot of work for no significant gain.

OTOH this PR doesn't allow enabling the colorization explicitly on Windows, maybe the check should be earlier.

akshaysharmajs commented 4 years ago

Ideally the colorization should stay enabled on the terminals that actually support it, such as the ones from WSL or MSYS, but I suspect we need to check terminfo for that which is a lot of work for no significant gain. I completely agree with this.

I just made a read for the problem and found out that Windows 10's console recently began interpreting escapes, but only with ENABLE_VIRTUAL_TERMINAL_PROCESSING which is off by default. So i tried enabling it and i think it works just fine.

wRAR commented 4 years ago

I wonder if this new code works on pre-Windows 10 terminals? And again, what about non-cmd.exe terminals?

And I'd want some error handling there.

akshaysharmajs commented 4 years ago

I wonder if this new code works on pre-Windows 10 terminals?

I just added a check for the windows 10 release and for specific version 10.0.14393 as the ANSI color codes problem only persist after this release of windows. So, there ll' be no problem for pre-windows 10 terminals. Also, ENABLE_VIRTUAL_TERMINAL_PROCESSING flag is only available in windows 10 and up.

akshaysharmajs commented 4 years ago

what about non-cmd.exe terminals

For unix terminals running on windows its working fine according to me, @wRAR Please check it once. WINDOWS CMD.EXE 1 CYGWIN TERMINAL (RUNNING ON WINDOWS) 2

Gallaecio commented 4 years ago

Just for reference, Django’s implementation: https://github.com/django/django/blob/master/django/core/management/color.py#L12

akshaysharmajs commented 4 years ago

@wRAR Is there anything that concerns you about the changes that I have made? or what further changes can be done?

Gallaecio commented 4 years ago

Django’s implementation seems less prone to issues to me, as it does not rely on specific Windows versions. Could you check if it also works for us?

wRAR commented 4 years ago

@AKSHAYSHARMAJS my point about error handling is still valid.

akshaysharmajs commented 4 years ago

@wRAR @Gallaecio thanks for reply. I have added some error handling (used django's implementation for reference). windows versions check is still there as windows has no real support for ANSI escape sequences.

akshaysharmajs commented 4 years ago

Thanks @Gallaecio for the feedback and suggested changes. I 'll keep in mind all aesthetic for later contributions.

akshaysharmajs commented 4 years ago

@Gallaecio @wRAR any suggestions on changes so far : )

akshaysharmajs commented 4 years ago

pytest.ini is conflicting again?

Gallaecio commented 4 years ago

Yes, @elacuesta is making some great progress on code cleanup.

elacuesta commented 4 years ago

Sorry about that. Implementation looks good AFAICT, but I also don't have access to a Windows machine and we're currently not testing with it (see #4458). Still, it'd be good to test that nothing breaks in the currently tested platform. From what I can see, this is only used in the parse command, but their tests read from stdout/stderr which unless I'm mistaken would be read as plain text, without color information. Do you think you could add some small test for the _colorize or pformat functions? Something like assert _colorize('{"a": 1}') == '{\x1b[33m"\x1b[39;49;00m\x1b[33ma\x1b[39;49;00m\x1b[33m"\x1b[39;49;00m: \x1b[34m1\x1b[39;49;00m}\n'

akshaysharmajs commented 4 years ago

Okay, I will try to add some small tests for the changes.

akshaysharmajs commented 4 years ago

I don't know how can I get the default escape sequence for different OS as I tried running the changes in linux and Windows both are providing different colors in terminal. Is there a way to get default escape sequences current OS is using?

Gallaecio commented 4 years ago

You could write separate tests for Windows and non-Windows.

akshaysharmajs commented 4 years ago

okay thanks

akshaysharmajs commented 4 years ago

tests are failing as testing file is not running in a real terminal and that is why sys.stdout.isatty()condition is failing and returning only plaintext. @Gallaecio @elacuesta please help here?

Gallaecio commented 4 years ago

Maybe you could use mock to trick the code into thinking it is in a terminal.

akshaysharmajs commented 4 years ago

` class TestDisplay(TestCase):

@mock.patch('sys.stdout.isatty', autospec=True)
def test_color(self, mock_isatty):
    mock_isatty.return_value = True
    self.assertEqual(_colorize('{"a": 1}'),'{\x1b[33m"\x1b[39;49;00m\x1b[33ma\x1b[39;49;00m\x1b[33m"\x1b[39;49;00m: \x1b[34m1\x1b[39;49;00m}\n')

` above test is still failing as _colorize is returning only a plaintext. what wrong am i doing here?

elacuesta commented 4 years ago

Not sure, but I suspect you might be entering in one of the except blocks that return text. I just pushed a simple test case, I made it dependent on _color_support_info because my regular terminal has _color_support_info==256 but it seems like the tox environment has only 16 colors. I guess that's because curses is not installed in the tox env.

akshaysharmajs commented 4 years ago

Thanks @elacuesta for the changes.

Not sure, but I suspect you might be entering in one of the except blocks that return text. I just

I figured out that GetStdhandle() is giving wrong handle during test I don't know why?. I tried passing actual handle to SetConsoleMode() then test is passing fine.

akshaysharmajs commented 3 years ago

image Just for reference, maybe this Issue and this PR can help with above error.

Gallaecio commented 3 years ago

The tests use the latest version of Tox, which I believe includes that PR.

akshaysharmajs commented 3 years ago

I figured out that GetStdhandle() is giving wrong handle during test I don't know why?. I tried > passing actual handle to SetConsoleMode() then test is passing fine.

Can you please tell me why this is happening? GetStdhandle() retrieving wrong STDOUT handle.

Gallaecio commented 3 years ago

I see the Django implementation I mentioned earlier does nor rely on curses. What made you take the curses approach, did you see a similar approach somewhere else?

akshaysharmajs commented 3 years ago

yes, I saw a similar approach that uses curses for terminfo. Django implementation was not giving any colors when the platform is windows.

akshaysharmajs commented 3 years ago

I think maybe we should just remove check for terminfo and just check if current terminal supports color or not using curses.has_colors() and use normal terminal formatter for all platforms as we are using default colors and not taking user input for specific colors.

Gallaecio commented 3 years ago

Sounds simpler, I think it’s worth giving it a try.

akshaysharmajs commented 3 years ago

@Gallaecio @elacuesta Please go through the changes. I have used only normal terminal formatter for all platforms and added a separate test for windows by mocking ctypes.

akshaysharmajs commented 3 years ago

I figured out that GetStdhandle() is giving wrong handle during test I don't know why?. I tried passing actual handle to SetConsoleMode() then test is passing fine.

This test is passing fine as GetStdHandle() giving valid handle.

Gallaecio commented 3 years ago

Could you work on 100% coverage of your changes? (see https://codecov.io/gh/scrapy/scrapy/compare/c86a1035dd9b8b10acaf8f9e8bdb1b5494a287e2...91bf06c48724a844886d1514e9c49b0ff3bb5a68/diff)

akshaysharmajs commented 3 years ago

Okay, I am working on it.

Gallaecio commented 3 years ago

Please, have a look at https://github.com/AKSHAYSHARMAJS/scrapy/pull/1

It’s a refactoring proposal, where separate methods are used in the color-handling implementation for easier testing and better code readability.

Note that I removed the curses stuff, because I still don’t understand why it is needed, and fear it may result in terminals supporting colors not displaying them just because curses is not installed. But I may be wrong here.

akshaysharmajs commented 3 years ago

@Gallaecio thanks for the changes, looks perfect and precise to me. I see removing curses is completely fine and also helping in testing. I don't know if we are fine with windows versions < 10.0.14393 with no colors?

Gallaecio commented 3 years ago

I don't know if we are fine with windows versions < 10.0.14393 with no colors?

I thought those Windows versions would not support color.

Is that what curses was for? Do older Windows versions correctly parse ANSI color codes if curses is installed?

If so, maybe it makes sense to add back the curses code (but only for those old Windows versions?).

On a separate note, my refactoring broke the Flake8 checks, as well as Python 3.5 tests through the use of f-strings (not the first time I forgot Scrapy cannot use them yet :facepalm: ).

akshaysharmajs commented 3 years ago

curses was only for different formatters for different platform. but now we are only using normal formatter so there is no use of it.

Is that what curses was for? Do older Windows versions correctly parse ANSI color codes if curses is installed?

Older versions of windows also interpret ansi escape codes as terminal processing is enabled by default and in newer versions we have to enable it separately.

akshaysharmajs commented 3 years ago

On a separate note, my refactoring broke the Flake8 checks, as well as Python 3.5 tests through the use of f-strings (not the first time I forgot Scrapy cannot use them yet 🤦 ).

:smile:😄 no problem I will fix it.

Gallaecio commented 3 years ago

Older versions of windows also interpret ansi escape codes as terminal processing is enabled by default and in newer versions we have to enable it separately.

Oh, I had no idea. I see you’ve already fixed the logic accordingly, so :+1:.

akshaysharmajs commented 3 years ago

Yes I have fixed it accordingly. :smiley: Thanks @Gallaecio

akshaysharmajs commented 3 years ago

Please suggest, what else can be done here??

wRAR commented 3 years ago

Thank you!