zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.83k stars 6.6k forks source link

Windows Build System Documentation Improvements #50017

Open vincemulhollon opened 2 years ago

vincemulhollon commented 2 years ago

Describe the bug

I have two separate windows build system documentation bug fixes related to bug #49987

To Reproduce

Obtain a domain-attached PC with Windows 11 Pro Version 21H2 OS Build 22000.918

Note the corporate Active Directory GPO thing (I'm clearly not a windows sysadmin) force-installs a licensed registered copy of Spybot - Search & Destroy + AV corporate licensed version 2.9.82.0 start center 2.9.82.139

Follow the windows build system installation instructions in the Zephyr Getting Started Guide.

Bug 1: Note that the Python package installation step fails because MSVC++ CLI build tools are not installed.

This can be fixed by installing MSVC++ CLI build tools, and that fix should be documented as part of the installation process.

Bug 2: Note that builds repeatedly and reliably crash out at the same point with a "Exit code 0xc0000417", while experiencing very slow processing.

The key to fixing the "Exit code 0xc0000417" problem was obtaining a non-active directory domain attached laptop, which works fine, and doing the typical binary search process to find the difference between a non-working corporate controlled domain configured laptop and a working freshly installed non-domain attached windows 11 laptop.

It would seem this exit code crash is a 100% reproducible bug under these very specific conditions:

Windows 11 Pro Version 21H2 OS Build 22000.918

Spybot - Search & Destroy + AV corporate licensed domain GPO installed version 2.9.82.0 start center 2.9.82.139

Right click on the Spybot Search and Destroy icon and checkmark ON "Live Protection". (edited to note, this is the domain configured default)

Wipe the .cache and build directories. Now, 100% of the time every Zephyr build will repeatedly crash at the same point 7 minutes and 54 seconds into the build process with the "Exit code 0xc0000417" error.

Next, right click on the Spybot Search and Destroy icon and checkmark OFF "Live Protection".

Wipe the .cache and build directories. Now, 100% of the time every Zephyr build will successfully complete in 56 seconds.

I can repeatedly reproduce both scenarios the times within 5% or so each run and the crash 100% reliably by clicking "Live Protection" on and off.

Expected behavior

Zephyr compiles quickly and error free as per the Getting Started Guide instructions for a windows environment.

Impact

A show stopper if you're trying to develop on Windows. Took about three work days to figure out both work arounds.

Logs and console output

See bug #49987

Environment (please complete the following information):

Windows 11 Pro Version 21H2 OS Build 22000.918

Spybot - Search & Destroy + AV corporate licensed domain GPO installed version 2.9.82.0 start center 2.9.82.139

Zephyr dev system installed on 01-Sept-2022 as per Zephyr Getting Started Guide

Additional context

Documentation bug fix number 1

In the getting started guide I would propose adding the following text to the "Install Python Dependencies" section for Windows.

"Zephyr requires Python's psutil package.
Installation of psutil requires “Microsoft Visual C++ 14.0 or greater". "Build Tools for Visual Studio 2022" are a free five gigabyte download available at: https://visualstudio.microsoft.com/downloads/?q=build+tools This psutil installation requirement is documented in the upstream package at: https://github.com/giampaolo/psutil/blob/master/INSTALL.rst"

Documentation bug fix number 2

I propose adding the following text to the Getting Started Guide in the "Select and Update OS" section for Windows:

"Under some circumstances, anti-virus and anti-spamware software "Live Protection" features have resulted in "Exit code 0xc0000417" crashes during builds and a negative performance impact in excess of running eight times slower. If you experience these problems you may wish to experiment with shutting off "Live Protection" functionality."

Summary of the issue (not to be included in the web page docs, although its very true)

On one hand, working with Windows is always difficult and expensive, as this cost three full days of developer time, on the other hand any time you fight Windows and win, the immense difficulty makes the victory all the more sweet.

Thanks for your time!

carlescufi commented 2 years ago

@vincemulhollon thanks for the detailed report. Would you mind opening a Pull Request with the suggestions you have in this issue? I do have a couple of quick observations though:

For doc bug fix 1: is this applicable also when you install Python via choco install? Because right now that is the only "officially documented" way of installing Python. If it doesn't then your doc fix maybe belongs in the Python docs and not here?

For doc bug fix 2: Fine with adding this, please add it to the Beyond the Getting Started guide section

vincemulhollon commented 2 years ago

I will try to replicate doc bug fix 1 on a spare laptop, but IIRC I did install python via choco

stephanosio commented 2 years ago

This is not a bug. Converted to Enhancement.

Also:

Installation of psutil requires “Microsoft Visual C++ 14.0 or greater".

PyPI already provides the wheels (pre-built binary packages) for the psutil package, so I am not sure why you need this.

"Under some circumstances, anti-virus and anti-spamware software "Live Protection" features have resulted in "Exit code 0xc0000417" crashes

This really sounds like an issue specific to the particular antivirus software you are using.

a negative performance impact in excess of running eight times slower

It is a well known fact that leaving "Live Protection" on, regardless of whichever antivirus software you use, is going to significantly slow down file accesses -- I am not sure if this needs to be documented by Zephyr.

vincemulhollon commented 2 years ago

Obtained a second windows laptop, this one is domain attached, mostly untouched since re-imaging, earlier this year.

Windows 11 Version 21H2 OS Build 22000.856

Note the OS build is older than the domain attached desktop; it had been in storage some time and will be upgraded eventually.

I followed the windows instructions in the Getting Started Guide

I am pleased to report the psutil python package install worked and it used the wheel.

I checked:

https://pypi.org/project/psutil/#files

And the date of the .whl file for windows displays as the 5th and the date the original desktop installation failed was the 4th or 5th (it was a late night and I don't know the relative timezones) and I opened the github issue a day or two later on the 6th.

In summary for bug 1, I would blame the failure to find the wheel for the Python package on the wheel being uploaded around the time pip failed to download it.

This would be an unlikely enough failure mode as to be unnecessary to document it. The odds of a package being downloaded before it's wheel is uploaded seem low to me, although, clearly, it obviously happened at least one time.

vincemulhollon commented 2 years ago

As for the other documentation issue, I will submit a PR on that soon.

In a generic sense, yes, everyone knows anti-virus type stuff can be problematic in a generic sense, although I've never experienced this specific software crash any build system of any type for any language in the past for Python, Node.JS, Dart, platform.io, or MbedOS development, so it was kind of a surprise it crashed Zephyr's ninja with a mysterious error message. Note that extensive google research found no historical error reports and this specific problem seems completely undocumented in any manner. Spybot S+D is not that unpopular of a program, so this undocumented problem could theoretically affect "many" developers.

The problem could only be found by spending a day or two doing a binary search of the programs installed on a domain attached desktop and non-domain attached freshly re-imaged laptop, which is kind of a huge waste of time if others can avoid it.

The possible number of affected endusers times the effort required to figure it out (google was of absolutely no help) would multiply to a minor issue worthy of possibly as little as one line of documentation.

So I'll submit a very small PR soon.