indygreg / PyOxidizer

A modern Python application packaging and distribution tool
Mozilla Public License 2.0
5.4k stars 234 forks source link

argv[0] is None #307

Open nnovikov opened 3 years ago

nnovikov commented 3 years ago

I have problem with pyoxidizer 0.9.0 and sys.argv. I make simple test app (via pyoxidizer init-config-file pyapp)

[/tmp/pyapp] $ cat foo/bar.py

import sys

def main():
    print("argv:", sys.argv)

if __name__ == "__main__":
    main()

and run it with parameters

/tmp/pyapp/./build/x86_64-unknown-linux-gnu/debug/install/pyapp foo bar
argv: [None, 'foo', 'bar']

As you can see, first element in argv is None. May be I need some config options for pyoxidizer.bzl?

wkschwartz commented 3 years ago

I have the same problem. I'll see if I can come up with a minimal example. @nnovikov would you please post your poxidizer.bzl?

nnovikov commented 3 years ago

You can see pyoxidizer.bzl here: https://dpaste.com//3UCYJZQK5

wkschwartz commented 3 years ago

Using what @nnovikov posted, I've got a minimal working and non-working examples below. The critical thing seems to be the use of the run_mode='module:...'.

Aside: specification of sys.argv

My reading of the sys.argv documentation suggests that (1) sys.argv should never be empty and (2) sys.argv should be a list containing str instances.

Examples

The examples below are totally self contained. Place them each in separate, otherwise empty directories in a file called pyoxidizer.bzl. Run pyoxidizer build in that directory.

Reproducing example

def make():
    dist = default_python_distribution()
    python_config = dist.make_python_interpreter_config()
    python_config.run_mode = 'module:foo'
    exe = dist.to_python_executable(
        name="pyapp",
        packaging_policy=dist.make_python_packaging_policy(),
        config=python_config,
    )
    exe.add_python_resource(exe.make_python_module_source(
        name="foo",
        source="import sys;\nif __name__ == '__main__':\n\tprint('sys.argv:', sys.argv)",
        is_package=False))
    files = FileManifest()
    files.add_python_resource(".", exe)
    return files
register_target("install", make, default=True)
resolve_targets()

Result:

$ build/x86_64-apple-darwin/debug/install/pyapp foo bar                                                                                                                                               
sys.argv: [None, 'foo', 'bar']

Non-reproducing example

def make():
    dist = default_python_distribution()
    python_config = dist.make_python_interpreter_config()
    python_config.run_mode = 'eval:import sys; print(sys.argv)'
    exe = dist.to_python_executable(
        name="pyapp",
        packaging_policy=dist.make_python_packaging_policy(),
        config=python_config,
    )
    files = FileManifest()
    files.add_python_resource(".", exe)
    return files
register_target("install", make, default=True)
resolve_targets()

Result:

$ build/x86_64-apple-darwin/debug/install/pyapp foo bar                                                                                                                                               
sys.argv: ['build/x86_64-apple-darwin/debug/install/pyapp', 'foo', 'bar']

Diagnosis

I think this is actually coming from Python code, though PyOxidizer should certainly do something to avoid this. When run_module is set, pymain_run_module runs with set_argv0=1. pymain_run_module then calls runpy._run_module_as_main with arguments based on set_argv. Then runpy._run_module_as_main sets sys.argv[0] from mod_spec.origin, which is the same thing as mod.__file__, which doesn't exist in PyOxidizer embedded modules.

Expected Behavior

It would be much less surprising when using run_module="..." or run_mode="module:..." for sys.argv[0] to equal sys.executable.

indygreg commented 3 years ago

Thank you for the detailed reports and investigation, @nnovikov and @wkschwartz!

I haven't yet looked at this in detail, but as of PyOxidizer 0.9, the run-time code should only be calling into the Python C APIs for running a {code, module, file}. So any behavior here is likely the behavior of CPython.

What could be in play is that unless you set the Python interpreter config profile to python, you are in isolated config mode, which doesn't behave like a python executable. That could be influencing behavior.

Something else at play is argv[0] is handled specially in some code paths in CPython. Sometimes sys.argv[0] refers to the current binary (normally a python variation). Sometimes it is the script that is executing. To be honest, I don't know the full rules here. PyOxidizer is just calling out into CPython's code, so I was hoping to not have to care about the details. But we obviously want correct and reasonable behavior here. At the very least we should document differences where they exist. So I intend to get to the bottom of this.

nnovikov commented 3 years ago

I think pyoxidizer should copy own argv[0] to CPython argv[0] if possible. Also create a parameter to change this behavior.

wkschwartz commented 3 years ago

Response to @nnovikov

pyoxidizer should copy own argv[0] to CPython argv[0]

Is there any reason "pyoxidizer['s] own argv[0]" would differ from sys.executable (other than the gurantee that sys.executable is an absolute path if it's not "" or None)? If no, then I think your suggestion and mine ("Expected Behavior" in my comment above) are in alignment.

Response to @indygreg

I haven't yet looked at this in detail, but as of PyOxidizer 0.9, the run-time code should only be calling into the Python C APIs for running a {code, module, file}. So any behavior here is likely the behavior of CPython.

I'm guessing that line 620 in the preview below is the key line? https://github.com/indygreg/PyOxidizer/blob/fea0562164e2e48d10622c831a5175dfc729674a/pyembed/src/interpreter.rs#L618-L620

If so, then I think that confirms the "Diagnosis" section in https://github.com/indygreg/PyOxidizer/issues/307#issuecomment-721402129

What could be in play is that unless you set the Python interpreter config profile to python, you are in isolated config mode, which doesn't behave like a python executable. That could be influencing behavior.

That's not the case. The config values don't affect how pymain_run_module gets called except via config->run_module. Anyway, to test it, apply the following patch to my reproducing example in https://github.com/indygreg/PyOxidizer/issues/307#issuecomment-721402129 and run pyoxidizer run. You'll get sys.argv: [None] as before.

@@ -1,6 +1,7 @@
     python_config = dist.make_python_interpreter_config()
+    python_config.config_profile = "python"
     python_config.run_mode = 'module:foo'

Something else at play is argv[0] is handled specially in some code paths in CPython.

In the case of run_module, that special handling is here.

Sometimes sys.argv[0] refers to the current binary (normally a python variation). Sometimes it is the script that is executing. To be honest, I don't know the full rules here.

The rules are documented in Interface Options. Here's a short version. In a normal Python executable, there are 5 ways of starting the interpreter, each with its own argv[0] treatment. In the list below, the I first name the options passed to the Python command followed by what argv[0] looks like

  1. no option or -i (REPL): argv[0] == "".
  2. -c command: argv[0] == "-c"
  3. - (execute script from stdin): argv[0] == "-"
  4. -m pkg.mod: before pkg.mod is located, argv[0] == "-m"; once located, argv[0] == pkg.mod.__spec__.origin == pkg.mod.__file__
  5. script name (can be a .py file or a [possibly zipped] directory containing a __main__.py): argv[0] is the name of the script as passed on the command line

PyOxidizer is just calling out into CPython's code, so I was hoping to not have to care about the details. But we obviously want correct and reasonable behavior here. At the very least we should document differences where they exist. So I intend to get to the bottom of this.

Here is where I think you get control over this situation: spec = importlib.util.find_spec(mod_name), where mod_name is python_config.run_module or python_config.run_module + ".__main__".

Proposal

When pyoxidizer.bzl has python_config.run_module = "pkg.mod" set or python_config.run_mode = "module:pkg.mod" and pkg.mod comes from OxidizedFinder, rather than doing the usual setting of pkg.mod.__spec__.origin to None, instead set pkg.mod.__spec__.origin = sys.executable. This exception to the usual rules should be made explicit in the documentation.

Alternative

If you don't like that plan, you can monkey patch runpy._run_module_as_main.

wkschwartz commented 3 years ago

Come to think of it, maybe this ticket represents a bug in the Python library specification. We already know that a module's __spec__.origin is allowed to be None. argv[0] is supposed to be the "full path to the module file" when using the -m option. But if the module doesn't have a file, then argv[0] cannot be the full path of the module file.

Does this warrant a ticket on bpo?

UPDATE: Maybe relevant:

wkschwartz commented 3 years ago

I created bpo-42315 for this. I still think a PyOxidizer-specific workaround is warranted in the meantime. I might be able to help if I get some pointers about where the relevant code/tests/docs would go.

n8henrie commented 3 years ago

Just wanted to chime in -- this seems to be the cause of a problem that is preventing a small project of mine from building with 0.10.3, which interesting was building fine on 0.6 and 0.7.

The error I'm seeing, to hopefully lead future searchers this way, is when running pyoxidizer run:

Traceback (most recent call last):
  File "runpy", line 194, in _run_module_as_main
  File "runpy", line 87, in _run_code
  File "terminalvelocity.__main__", line 3, in <module>
  File "terminalvelocity.cli", line 49, in run
  File "argparse", line 1660, in __init__
  File "posixpath", line 142, in basename
TypeError: expected str, bytes or os.PathLike object, not NoneType
error: cargo run failed

It looks like argparse relies on sys.argv if prog is unset. This gets passed to basename, which relies on os.fspath, which freaks out when it gets None. Instead of leaving prog empty, if I pass in a value for prog explicitly (e.g. prog=__package__), it runs without issue.

I think this will be a common issue for CLI apps, which often use argparse, and will often rely on the default behavior to populate the progname.

wkschwartz commented 3 years ago

The problem with prog=__package__ is that ArgumentParser computes the usage help message from prog. You risk the following user interaction (error messages are slightly wrong but hopefully you get the idea):

$ terminalvelocity.exe badargs
Usage: terminalvelocity.cli [-h] goodargs
Unknown argument for terminalvelocity.cli: "badargs"
$ terminalvelocity.cli goodargs
Unknown command: "terminalvelocity.cli"

I would recommend instead, as I did in https://github.com/indygreg/PyOxidizer/issues/307#issuecomment-721841352, to use prog=sys.executable.

n8henrie commented 3 years ago

Huh. I guess that's fair if cli is a package, which it may be in many cases. In mine, __package__ works as intended, either when run from python -m or from the console_script it installs (tv):

EDIT: On second look, you're correct -- using __package__ it's pointing users toward terminalvelocity -h, which won't work (needs to be either tv -h or python -m terminalvelocity -h. But it doesn't seem that the sys.executable workaround is much better as /Absolute/path/to/python -h won't do much better.

$ python -m terminalvelocity -h
usage: terminalvelocity [-h] [-c CONFIG] [-e EDITOR] [-x EXTENSION] [--extensions EXTENSIONS]
                        [--exclude EXCLUDE] [-d] [-l LOG_FILE] [-p]
                        [notes_dir]

A fast note-taking app for the UNIX terminal
$ tv -h
usage: terminalvelocity [-h] [-c CONFIG] [-e EDITOR] [-x EXTENSION] [--extensions EXTENSIONS]
                        [--exclude EXCLUDE] [-d] [-l LOG_FILE] [-p]
                        [notes_dir]

A fast note-taking app for the UNIX terminal

Using the sys.executable workaround makes a mess of things in both cases:

$ python -m terminalvelocity -h
usage: /Users/me/git/terminalvelocity/.venv/bin/python [-h] [-c CONFIG] [-e EDITOR] [-x EXTENSION]
                                                             [--extensions EXTENSIONS] [--exclude EXCLUDE] [-d]
                                                             [-l LOG_FILE] [-p]
                                                             [notes_dir]

A fast note-taking app for the UNIX terminal
$ tv -h
usage: /Users/me/git/terminalvelocity/.venv/bin/python [-h] [-c CONFIG] [-e EDITOR] [-x EXTENSION]
                                                             [--extensions EXTENSIONS] [--exclude EXCLUDE] [-d]
                                                             [-l LOG_FILE] [-p]
                                                             [notes_dir]

A fast note-taking app for the UNIX terminal
wkschwartz commented 3 years ago

Sorry, I was referring specifically to using prog=sys.executable from inside PyOxidizer-compiled executables. I agree that setting prog=sys.executable doesn't make much sense from inside a vanilla CPython.

hamzamohdzubair commented 3 years ago

I am using click to create a CLI Application, and i have the exact same issue, is there a temporary workaround for this yet?

hamzamohdzubair commented 3 years ago

@indygreg I am thinking of an alternative way of solving this issue:

  1. Build CLI args processing in native rust itself
  2. Send processed CLI arguments to python modules

Step 1 is simple. Can you help with step 2, please. Have gone through the entire pyoxidizer documentation and unable to find a start. Hope you can help.

@wkschwartz @n8henrie. Any pointers will be great. Thanks

spicy-sauce commented 2 years ago

I am using click to create a CLI Application, and i have the exact same issue, is there a temporary workaround for this yet?

Same here. Is there any update on this? Thanks.

wkschwartz commented 2 years ago

I think this comes down ultimately to an ambiguity in Python's specification: sys.argv is supposed to contain strs; when Python is launched with -m (as it is in some Pyoxidizer configurations), sys.argv[0] is supposed to me __main__.__file__; and __file__ should be missing when meaningless (as it is in some Pyoxidizer configurations). Please post your report and use case at python/cpython#86481