phylum-dev / phylum-ci

Python package for handling CI and other integrations
GNU General Public License v3.0
10 stars 1 forks source link

[SPIKE] Investigate Windows binary options #472

Open maxrake opened 3 days ago

maxrake commented 3 days ago

Description

This is a research spike to investigate whether it is possible to create a standalone Windows binary for the phylum-ci package.

Additional Details

Acceptance Criteria

maxrake commented 3 days ago

It is possible to create a single file standalone Windows binary from the phylum-ci package that has the same functionality offered in the phylum-ci script entry point!

The binary can be viewed here, in a recent Preview workflow run. It can also be generated with a manual run of the Preview workflow, using the windows_check branch and selecting the option to "Create Windows binary."

Nuitka was used to generate the binary since it appeared to have the most features and active development. Beeware's Briefcase also looks very polished but it generates MSI installer files, which would require an extra step compared to a standalone EXE file. The py2exe tool appears a bit dated and lacks some of the universal compatibility/portability support that Nuitka provides. The PyInstaller tool appears to be the next best choice. It may have also met the requirements but was not explored in detail since Nuitka worked.

Testing with the produced binary in a representative environment is still needed.

maxrake commented 2 days ago

Initial testing is showing some rough edges with the way the Phylum CLI treats Windows canonicalized paths, namely in the .phylum_project file. When such a file is present, and the dependency files are acquired from it, they include the Windows device drive prefix \\?\. This causes an unhandled exception due to using os.path.relpath for debug printable string representations:

Details

``` INFO Using Phylum group: phylum_bot Exception in thread Thread-14 (_readerthread): Traceback (most recent call last): File "C:\Users\RUNNER~1\AppData\Local\PHYLUM~1\049~1.1-4\threading.py", line 1075, in _bootstrap_inner File "C:\Users\RUNNER~1\AppData\Local\PHYLUM~1\049~1.1-4\threading.py", line 1012, in run File "C:\Users\RUNNER~1\AppData\Local\PHYLUM~1\049~1.1-4\subprocess.py", line 1599, in _readerthread File "encodings\cp1252.py", line 23, in decode UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1: character maps to INFO Project/group pairing already exists. Continuing with it. Project: phylum-ci Group: phylum_bot DEBUG Repository URL not available to set INFO No valid dependency files were provided as arguments. An attempt will be made to detect them. +--------------------- Traceback (most recent call last) ---------------------+ | in :271 | | | | in script_main:267 | | | | in main:230 | | | | in __get__:993 | | | | in depfiles:[165](https://github.com/phylum-dev/phylum-ci/actions/runs/10914352520/job/30292353042#step:9:166) | | | | in debug:1527 | | | | in _log:[168](https://github.com/phylum-dev/phylum-ci/actions/runs/10914352520/job/30292353042#step:9:169)4 | | | | in handle:[170](https://github.com/phylum-dev/phylum-ci/actions/runs/10914352520/job/30292353042#step:9:171)0 | | | | in callHandlers:1762 | | | | in handle:1028 | | | | in emit:134 | | | | in format:999 | | | | in format:703 | | | | in getMessage:392 | | | | in __repr__:66 | | | | in relpath:783 | +-----------------------------------------------------------------------------+ ValueError: path is on mount '\\\\?\\D:', start on mount 'D:' ```

The full logs can be viewed here. Work is progressing to determine a good path forward. Possible options:

maxrake commented 1 day ago

The \\?\ prefix issue has been resolved with a tested solution in the Phylum CLI, by using the dunce crate for paths in the phylum project file. A separate review for that will go up shortly.

maxrake commented 1 day ago

There was another issue where the v7.0.0 release of Phylum CLI does not enable extensions on the Windows binary. Extensions are possible there, by enabling them as a feature when building. A separate review will go up for that. Testing with a CLI binary with extensions enabled showed a successful run. The test cases were against the phylum-ci repository:

maxrake commented 1 day ago

One remaining idiosyncrasy is the following output, mixed in with the standard output:

Exception in thread Thread-14 (_readerthread):
Traceback (most recent call last):
  File "C:\Users\RUNNER~1\AppData\Local\PHYLUM~1\049~1.1-4\threading.py", line 1075, in _bootstrap_inner
  File "C:\Users\RUNNER~1\AppData\Local\PHYLUM~1\049~1.1-4\threading.py", line 1012, in run
  File "C:\Users\RUNNER~1\AppData\Local\PHYLUM~1\049~1.1-4\subprocess.py", line 1599, in _readerthread
  File "encodings\cp1252.py", line 23, in decode
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1: character maps to <undefined>

This looks like a file being read with the default Windows cp1252 encoding when it should be something else...perhaps UTF-8. The file is likely .phylum_project.

FWIW, it doesn't appear that the "errors" hamper execution of the program...but it is not a good look in the output. Further investigation and testing will come next.

kylewillmon commented 1 day ago

The file is likely .phylum_project.

Why would there be an 0x9d byte in .phylum_project? Are you testing with a non-ascii path or group name or something?

maxrake commented 1 day ago

Why would there be an 0x9d byte in .phylum_project? Are you testing with a non-ascii path or group name or something?

There isn't...that was just an initial WAG. Further inspection of the codebase reveals that all file reads are done with an encoding specified. I'm still not sure where this originates. Maybe it comes from a file or bootstrap code that Nuitka generates. I'll look a little further, perhaps with some SysInternals tools that can better pinpoint the resource accessed when this exception occurs.

kylewillmon commented 1 day ago

Looking back at your traceback, I think this is coming from a call to subprocess.run() and not a file read... although I'm still not sure what program would be outputting 0x9d

maxrake commented 1 day ago

I think you've cracked it. The docs for subprocess.run say:

If encoding or errors are specified, or text is true, file objects for stdin, stdout and stderr are opened in text mode using the specified encoding and errors or the io.TextIOWrapper default. The universal_newlines argument is equivalent to text and is provided for backwards compatibility. By default, file objects are opened in binary mode.

The docs for io.TextIOWrapper say:

encoding gives the name of the encoding that the stream will be decoded or encoded with. It defaults to locale.getencoding(). encoding="locale" can be used to specify the current locale’s encoding explicitly. See Text Encoding for more information.

And the docs for locale.getencoding say:

Get the current locale encoding:

  • On Android and VxWorks, return "utf-8".
  • On Unix, return the encoding of the current LC_CTYPE locale. Return "utf-8" if nl_langinfo(CODESET) returns an empty string: for example, if the current LC_CTYPE locale is not supported.
  • On Windows, return the ANSI code page.

Sure enough, on the Windows system used for testing, locale.getencoding() returns cp1252.

So...now I just need to check all the subprocess calls and specify the encoding. I'm assuming they can all be utf-8...but will check as I go.

maxrake commented 1 hour ago

The \\?\ prefix issue has been resolved with a tested solution in the Phylum CLI, by using the dunce crate for paths in the phylum project file. A separate review for that will go up shortly.

https://github.com/phylum-dev/cli/pull/1500 was created for this.