indygreg / PyOxidizer

A modern Python application packaging and distribution tool
Mozilla Public License 2.0
5.41k stars 234 forks source link

OxidizedFinder not loading file based resource data from folders that aren't Python packages #436

Open TheFriendlyCoder opened 3 years ago

TheFriendlyCoder commented 3 years ago

I'm relatively new to PyOxidizer, but I've been working for several days now to get one of my command line tools to run properly under the frozen environment and have hit a problem that looks to be a bug with PyOxidizer. Basically, the OxidizedFinder object does not seem to be loading data from resource files contained within Python packages properly. I've reduced the steps to reproduce the problem down to the following Starlark file:

def make_exe():
    dist = default_python_distribution()

    policy = dist.make_python_packaging_policy()
    policy.resources_location_fallback = None
    policy.resources_location = "filesystem-relative:lib"

    python_config = dist.make_python_interpreter_config()

    python_config.run_command = "import pkgutil; pkgutil.get_data('certifi', 'cacert.pem')"
    exe = dist.to_python_executable(
        name="pyapp",
        packaging_policy=policy,
        config=python_config,
    )
    for resource in exe.pip_install(["certifi"]):
        exe.add_python_resource(resource)
    return exe

def make_embedded_resources(exe):
    return exe.to_embedded_resources()

def make_install(exe):
    files = FileManifest()
    files.add_python_resource(".", exe)
    return files

def make_msi(exe):
    return exe.to_wix_msi_builder(
        "myapp",
        "My Application",
        "1.0",
        "Alice Jones"
    )

def register_code_signers():
    if not VARS.get("ENABLE_CODE_SIGNING"):
        return

register_code_signers()

register_target("exe", make_exe)
register_target("resources", make_embedded_resources, depends=["exe"], default_build_script=True)
register_target("install", make_install, depends=["exe"], default=True)
register_target("msi_installer", make_msi, depends=["exe"])

resolve_targets()

If you run "pyoxidizer run" against this build script it will fail with the following error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "pkgutil", line 639, in get_data
FileNotFoundError: [Errno 2] resource not known: '/Users/kevinp/Documents/src/sandbox/oxydizer1/build/x86_64-apple-darwin/debug/install/lib/certifi/cacert.pem'

From what I can tell this error is generated by PyOxider from 1 of 2 places, here or here.

I will point out that this is a contrived example I created to help illustrate the problem I'm finding. The "certifi" library itself actually loads the indicated resource using more robust methodologies that work with PyOxidizer, but I've found quite a few other libraries that suffer from the same fundamental problem. For example, if you change the following lines in the Starlark file you'll get a similar error by simply importing the jsonschema library because it (for better or worse) makes use of a pkgutil call in the global namespace which then fails immediately preventing you from even importing the library:

    python_config.run_command = "from jsonschema import validate"
    exe = dist.to_python_executable(
        name="pyapp",
        packaging_policy=policy,
        config=python_config,
    )
    for resource in exe.pip_install(["jsonschema"]):
        exe.add_python_resource(resource)

From what I can tell, any library that tries to use the OxidizedFinder, directly or indirectly, to load resource data simply fail outright with this error, even though I've confirmed the resource and path do exist and are correct.

Analysis The definition for the get_data method exposed by the pkgutil module in the standard library can be found here. If we look at that implementation and try running each operation in a REPL environment in pyoxidizer you can see some interesting things.

First I commented out the python_config.run_command line in the Starlark file and ran the interactive runtime using pyoxidizer run. In this environment I performed the following operations:

>>> package = "certifi"
>>> resource = "cacert.pem"
>>> import importlib
>>> import importlib.util
>>> spec = importlib.util.find_spec(package)
>>> spec
ModuleSpec(name='certifi', loader=<OxidizedFinder object at 0x10571cab0>, origin='/Users/kevinp/Documents/src/sandbox/oxydizer1/build/x86_64-apple-darwin/debug/install/lib/certifi/__init__.py', submodule_search_locations=['/Users/kevinp/Documents/src/sandbox/oxydizer1/build/x86_64-apple-darwin/debug/install/lib/certifi'])
>>> loader = spec.loader
>>> loader
<OxidizedFinder object at 0x10571cab0>
>>> mod = importlib._bootstrap._load(spec)
>>> mod
<module 'certifi' from '/Users/kevinp/Documents/src/sandbox/oxydizer1/build/x86_64-apple-darwin/debug/install/lib/certifi/__init__.py'>
>>> mod.__file__
'/Users/kevinp/Documents/src/sandbox/oxydizer1/build/x86_64-apple-darwin/debug/install/lib/certifi/__init__.py'
>>> import os
>>> resource_name = os.path.join(os.path.dirname(mod.__file__), resource)
>>> resource_name
'/Users/kevinp/Documents/src/sandbox/oxydizer1/build/x86_64-apple-darwin/debug/install/lib/certifi/cacert.pem'
>>> os.path.exists(resource_name)
True
>>> loader.get_data(resource_name)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] resource not known: '/Users/kevinp/Documents/src/sandbox/oxydizer1/build/x86_64-apple-darwin/debug/install/lib/certifi/cacert.pem'

As you can see from this output, the loader and module are created successfully, and the path to the data file is correct ... and it exists on disk, and yet the loader complains that the resource is unknown.

After doing some digging I found this issue that was created a couple of years ago, which is where the implementation for the OxidizerFinder.get_data method was implemented, and based on this very detailed comment it appears as though there are additional requirements that resource files must satisfy before they will be loaded by PyOxidizer. Most notably, the files have be under the same path the application executable is stored. So I thought maybe I'm running into a path problem here. I'm somewhat familiar with Rust code so I tried reading over the implementation to see how those checks were being done, and tried reflecting my understanding into some Python equivalent operations to see I could find any further problems. Below is what I found:

>>> loader.origin
'/Users/kevinp/Documents/src/sandbox/oxydizer1/build/x86_64-apple-darwin/debug/install'
>>> resource_name.startswith(loader.origin)
True
>>> import io
>>> fh = io.FileIO(resource_name, "r")
>>> fh.read()
... dump of valid PEM file contents - too long to paste here...

So, based on these findings it would appear as though the logic from this conditional block should be satisfied, meaning that this line should not be the source of the error. This means that in all likelihood there is a bug later in this function which results in the resource file being incorrectly excluded from the loading logic, causing this error to be thrown.

Unfortunately, this is the extent of my expertise. I don't know enough about Rust to debug this issue further. Any assistance you can provide would be greatly appreciated.

*Environment Host OS: Mac OS-X Python version: 3.9.6 PyOxidizer version: 0.17.0 tested against latest versions of certifi and jsonschema

TheFriendlyCoder commented 3 years ago

I will mention that I ran the pyoxidizer find-resources tool against my test libraries with the following results:

pyoxidizer find-resources ~/Downloads/jsonschema-3.2.0-py2.py3-none-any.whl
resolving Python distribution Url { url: "https://github.com/indygreg/python-build-standalone/releases/download/20210724/cpython-3.9.6-x86_64-apple-darwin-pgo-20210724T1424.tar.zst", sha256: "9e11a09bf1c2e4c1e2c4d0f403e7199186a6f993ecc135e31aa57294b5634cc7" }
downloading https://github.com/indygreg/python-build-standalone/releases/download/20210724/cpython-3.9.6-x86_64-apple-darwin-pgo-20210724T1424.tar.zst
Python distribution available at /var/folders/b0/g8z5zpvd7ml84ty003f7vpj40000gn/T/python-distributioniFJRbp/cpython-3.9.6-x86_64-apple-darwin-pgo-20210724T1424.tar.zst
reading data from Python distribution...
parsing /Users/kevinp/Downloads/jsonschema-3.2.0-py2.py3-none-any.whl as a wheel archive
File { path: jsonschema/__init__.py, is_executable: false }
PythonModuleSource { name: jsonschema, is_package: true, is_stdlib: false, is_test: false }
File { path: jsonschema/__main__.py, is_executable: false }
PythonModuleSource { name: jsonschema.__main__, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/_format.py, is_executable: false }
PythonModuleSource { name: jsonschema._format, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/_legacy_validators.py, is_executable: false }
PythonModuleSource { name: jsonschema._legacy_validators, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/_reflect.py, is_executable: false }
PythonModuleSource { name: jsonschema._reflect, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/_types.py, is_executable: false }
PythonModuleSource { name: jsonschema._types, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/_utils.py, is_executable: false }
PythonModuleSource { name: jsonschema._utils, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/_validators.py, is_executable: false }
PythonModuleSource { name: jsonschema._validators, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/benchmarks/__init__.py, is_executable: false }
PythonModuleSource { name: jsonschema.benchmarks, is_package: true, is_stdlib: false, is_test: false }
File { path: jsonschema/benchmarks/issue232.py, is_executable: false }
PythonModuleSource { name: jsonschema.benchmarks.issue232, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/benchmarks/json_schema_test_suite.py, is_executable: false }
PythonModuleSource { name: jsonschema.benchmarks.json_schema_test_suite, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/cli.py, is_executable: false }
PythonModuleSource { name: jsonschema.cli, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/compat.py, is_executable: false }
PythonModuleSource { name: jsonschema.compat, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/exceptions.py, is_executable: false }
PythonModuleSource { name: jsonschema.exceptions, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/schemas/draft3.json, is_executable: false }
File { path: jsonschema/schemas/draft4.json, is_executable: false }
File { path: jsonschema/schemas/draft6.json, is_executable: false }
File { path: jsonschema/schemas/draft7.json, is_executable: false }
File { path: jsonschema/tests/__init__.py, is_executable: false }
PythonModuleSource { name: jsonschema.tests, is_package: true, is_stdlib: false, is_test: false }
File { path: jsonschema/tests/_helpers.py, is_executable: false }
PythonModuleSource { name: jsonschema.tests._helpers, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/tests/_suite.py, is_executable: false }
PythonModuleSource { name: jsonschema.tests._suite, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/tests/test_cli.py, is_executable: false }
PythonModuleSource { name: jsonschema.tests.test_cli, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/tests/test_exceptions.py, is_executable: false }
PythonModuleSource { name: jsonschema.tests.test_exceptions, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/tests/test_format.py, is_executable: false }
PythonModuleSource { name: jsonschema.tests.test_format, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/tests/test_jsonschema_test_suite.py, is_executable: false }
PythonModuleSource { name: jsonschema.tests.test_jsonschema_test_suite, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/tests/test_types.py, is_executable: false }
PythonModuleSource { name: jsonschema.tests.test_types, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/tests/test_validators.py, is_executable: false }
PythonModuleSource { name: jsonschema.tests.test_validators, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema/validators.py, is_executable: false }
PythonModuleSource { name: jsonschema.validators, is_package: false, is_stdlib: false, is_test: false }
File { path: jsonschema-3.2.0.dist-info/COPYING, is_executable: false }
PythonPackageDistributionResource { package: jsonschema, version: 3.2.0, name: COPYING }
File { path: jsonschema-3.2.0.dist-info/METADATA, is_executable: false }
PythonPackageDistributionResource { package: jsonschema, version: 3.2.0, name: METADATA }
File { path: jsonschema-3.2.0.dist-info/RECORD, is_executable: false }
PythonPackageDistributionResource { package: jsonschema, version: 3.2.0, name: RECORD }
File { path: jsonschema-3.2.0.dist-info/WHEEL, is_executable: false }
PythonPackageDistributionResource { package: jsonschema, version: 3.2.0, name: WHEEL }
File { path: jsonschema-3.2.0.dist-info/entry_points.txt, is_executable: false }
PythonPackageDistributionResource { package: jsonschema, version: 3.2.0, name: entry_points.txt }
File { path: jsonschema-3.2.0.dist-info/top_level.txt, is_executable: false }
PythonPackageDistributionResource { package: jsonschema, version: 3.2.0, name: top_level.txt }
PythonPackageResource { package: jsonschema, name: schemas/draft3.json, is_stdlib: false, is_test: false }
PythonPackageResource { package: jsonschema, name: schemas/draft4.json, is_stdlib: false, is_test: false }
PythonPackageResource { package: jsonschema, name: schemas/draft6.json, is_stdlib: false, is_test: false }
PythonPackageResource { package: jsonschema, name: schemas/draft7.json, is_stdlib: false, is_test: false }
pyoxidizer find-resources ~/Downloads/certifi-2021.5.30-py2.py3-none-any.whl
resolving Python distribution Url { url: "https://github.com/indygreg/python-build-standalone/releases/download/20210724/cpython-3.9.6-x86_64-apple-darwin-pgo-20210724T1424.tar.zst", sha256: "9e11a09bf1c2e4c1e2c4d0f403e7199186a6f993ecc135e31aa57294b5634cc7" }
downloading https://github.com/indygreg/python-build-standalone/releases/download/20210724/cpython-3.9.6-x86_64-apple-darwin-pgo-20210724T1424.tar.zst
Python distribution available at /var/folders/b0/g8z5zpvd7ml84ty003f7vpj40000gn/T/python-distributionPcyPPQ/cpython-3.9.6-x86_64-apple-darwin-pgo-20210724T1424.tar.zst
reading data from Python distribution...
parsing /Users/kevinp/Downloads/certifi-2021.5.30-py2.py3-none-any.whl as a wheel archive
File { path: certifi/__init__.py, is_executable: false }
PythonModuleSource { name: certifi, is_package: true, is_stdlib: false, is_test: false }
File { path: certifi/__main__.py, is_executable: false }
PythonModuleSource { name: certifi.__main__, is_package: false, is_stdlib: false, is_test: false }
File { path: certifi/cacert.pem, is_executable: false }
File { path: certifi/core.py, is_executable: false }
PythonModuleSource { name: certifi.core, is_package: false, is_stdlib: false, is_test: false }
File { path: certifi-2021.5.30.dist-info/LICENSE, is_executable: false }
PythonPackageDistributionResource { package: certifi, version: 2021.5.30, name: LICENSE }
File { path: certifi-2021.5.30.dist-info/METADATA, is_executable: false }
PythonPackageDistributionResource { package: certifi, version: 2021.5.30, name: METADATA }
File { path: certifi-2021.5.30.dist-info/RECORD, is_executable: false }
PythonPackageDistributionResource { package: certifi, version: 2021.5.30, name: RECORD }
File { path: certifi-2021.5.30.dist-info/WHEEL, is_executable: false }
PythonPackageDistributionResource { package: certifi, version: 2021.5.30, name: WHEEL }
File { path: certifi-2021.5.30.dist-info/top_level.txt, is_executable: false }
PythonPackageDistributionResource { package: certifi, version: 2021.5.30, name: top_level.txt }
PythonPackageResource { package: certifi, name: cacert.pem, is_stdlib: false, is_test: false }

From this you can see that PyOxidizer does see and correctly identify the package resources I'm trying to load in my test scenario here, which is interesting. I would have assumed the find-resources command would be using the same / similar code that the actual loader is using so the two produce consistent results, but maybe not.

TheFriendlyCoder commented 3 years ago

One more interesting finding, if I run pyoxidizer find-resources against the site packages folder for, say, jsonschema after installing the same wheel file I scanned earlier, I get different results:

resolving Python distribution Url { url: "https://github.com/indygreg/python-build-standalone/releases/download/20210724/cpython-3.9.6-x86_64-apple-darwin-pgo-20210724T1424.tar.zst", sha256: "9e11a09bf1c2e4c1e2c4d0f403e7199186a6f993ecc135e31aa57294b5634cc7" }
downloading https://github.com/indygreg/python-build-standalone/releases/download/20210724/cpython-3.9.6-x86_64-apple-darwin-pgo-20210724T1424.tar.zst
Python distribution available at /var/folders/b0/g8z5zpvd7ml84ty003f7vpj40000gn/T/python-distributionNjgwmy/cpython-3.9.6-x86_64-apple-darwin-pgo-20210724T1424.tar.zst
reading data from Python distribution...
scanning directory ./venv/lib/python3.9/site-packages/jsonschema
File { path: __init__.py, is_executable: false }
PythonModuleSource { name: , is_package: true, is_stdlib: false, is_test: false }
File { path: __main__.py, is_executable: false }
PythonModuleSource { name: __main__, is_package: false, is_stdlib: false, is_test: false }
File { path: __pycache__/__init__.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: , is_package: true, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/__main__.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: __main__, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/_format.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: _format, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/_legacy_validators.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: _legacy_validators, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/_reflect.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: _reflect, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/_types.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: _types, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/_utils.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: _utils, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/_validators.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: _validators, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/cli.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: cli, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/compat.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: compat, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/exceptions.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: exceptions, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/validators.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: validators, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: _format.py, is_executable: false }
PythonModuleSource { name: _format, is_package: false, is_stdlib: false, is_test: false }
File { path: _legacy_validators.py, is_executable: false }
PythonModuleSource { name: _legacy_validators, is_package: false, is_stdlib: false, is_test: false }
File { path: _reflect.py, is_executable: false }
PythonModuleSource { name: _reflect, is_package: false, is_stdlib: false, is_test: false }
File { path: _types.py, is_executable: false }
PythonModuleSource { name: _types, is_package: false, is_stdlib: false, is_test: false }
File { path: _utils.py, is_executable: false }
PythonModuleSource { name: _utils, is_package: false, is_stdlib: false, is_test: false }
File { path: _validators.py, is_executable: false }
PythonModuleSource { name: _validators, is_package: false, is_stdlib: false, is_test: false }
File { path: benchmarks/__init__.py, is_executable: false }
PythonModuleSource { name: benchmarks, is_package: true, is_stdlib: false, is_test: false }
File { path: benchmarks/__pycache__/__init__.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: benchmarks, is_package: true, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: benchmarks/__pycache__/issue232.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: benchmarks.issue232, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: benchmarks/__pycache__/json_schema_test_suite.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: benchmarks.json_schema_test_suite, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: benchmarks/issue232.py, is_executable: false }
PythonModuleSource { name: benchmarks.issue232, is_package: false, is_stdlib: false, is_test: false }
File { path: benchmarks/json_schema_test_suite.py, is_executable: false }
PythonModuleSource { name: benchmarks.json_schema_test_suite, is_package: false, is_stdlib: false, is_test: false }
File { path: cli.py, is_executable: false }
PythonModuleSource { name: cli, is_package: false, is_stdlib: false, is_test: false }
File { path: compat.py, is_executable: false }
PythonModuleSource { name: compat, is_package: false, is_stdlib: false, is_test: false }
File { path: exceptions.py, is_executable: false }
PythonModuleSource { name: exceptions, is_package: false, is_stdlib: false, is_test: false }
File { path: schemas/draft3.json, is_executable: false }
File { path: schemas/draft4.json, is_executable: false }
File { path: schemas/draft6.json, is_executable: false }
File { path: schemas/draft7.json, is_executable: false }
File { path: tests/__init__.py, is_executable: false }
PythonModuleSource { name: tests, is_package: true, is_stdlib: false, is_test: false }
File { path: tests/__pycache__/__init__.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests, is_package: true, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/_helpers.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests._helpers, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/_suite.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests._suite, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/test_cli.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests.test_cli, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/test_exceptions.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests.test_exceptions, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/test_format.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests.test_format, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/test_jsonschema_test_suite.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests.test_jsonschema_test_suite, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/test_types.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests.test_types, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/test_validators.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests.test_validators, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/_helpers.py, is_executable: false }
PythonModuleSource { name: tests._helpers, is_package: false, is_stdlib: false, is_test: false }
File { path: tests/_suite.py, is_executable: false }
PythonModuleSource { name: tests._suite, is_package: false, is_stdlib: false, is_test: false }
File { path: tests/test_cli.py, is_executable: false }
PythonModuleSource { name: tests.test_cli, is_package: false, is_stdlib: false, is_test: false }
File { path: tests/test_exceptions.py, is_executable: false }
PythonModuleSource { name: tests.test_exceptions, is_package: false, is_stdlib: false, is_test: false }
File { path: tests/test_format.py, is_executable: false }
PythonModuleSource { name: tests.test_format, is_package: false, is_stdlib: false, is_test: false }
File { path: tests/test_jsonschema_test_suite.py, is_executable: false }
PythonModuleSource { name: tests.test_jsonschema_test_suite, is_package: false, is_stdlib: false, is_test: false }
File { path: tests/test_types.py, is_executable: false }
PythonModuleSource { name: tests.test_types, is_package: false, is_stdlib: false, is_test: false }
File { path: tests/test_validators.py, is_executable: false }
PythonModuleSource { name: tests.test_validators, is_package: false, is_stdlib: false, is_test: false }
File { path: validators.py, is_executable: false }
PythonModuleSource { name: validators, is_package: false, is_stdlib: false, is_test: false }

Most notably, there are no PythonPackageResource entries for the json files like there is in the previous scan. For some reason the tool can detect the resources in the wheel file but not the site packages folder.

TheFriendlyCoder commented 3 years ago

Even more interesting findings. I took a closer look at the "schema" folder for the jsonschema package as it appears in the site-packages folder and I noticed that it only contains the json data files. So I thought, what if PyOxidizer is looking for a Python "package" containing those data files? It's not going to be able to find them unless there is also a file named init.py in the same folder. So I created an empty file with that name in the schema folder and now I see PythonPackageResource entries in the find-resources output:

resolving Python distribution Url { url: "https://github.com/indygreg/python-build-standalone/releases/download/20210724/cpython-3.9.6-x86_64-apple-darwin-pgo-20210724T1424.tar.zst", sha256: "9e11a09bf1c2e4c1e2c4d0f403e7199186a6f993ecc135e31aa57294b5634cc7" }
downloading https://github.com/indygreg/python-build-standalone/releases/download/20210724/cpython-3.9.6-x86_64-apple-darwin-pgo-20210724T1424.tar.zst
Python distribution available at /var/folders/b0/g8z5zpvd7ml84ty003f7vpj40000gn/T/python-distributionpLD7uB/cpython-3.9.6-x86_64-apple-darwin-pgo-20210724T1424.tar.zst
reading data from Python distribution...
scanning directory ./venv/lib/python3.9/site-packages/jsonschema
File { path: __init__.py, is_executable: false }
PythonModuleSource { name: , is_package: true, is_stdlib: false, is_test: false }
File { path: __main__.py, is_executable: false }
PythonModuleSource { name: __main__, is_package: false, is_stdlib: false, is_test: false }
File { path: __pycache__/__init__.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: , is_package: true, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/__main__.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: __main__, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/_format.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: _format, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/_legacy_validators.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: _legacy_validators, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/_reflect.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: _reflect, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/_types.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: _types, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/_utils.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: _utils, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/_validators.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: _validators, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/cli.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: cli, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/compat.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: compat, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/exceptions.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: exceptions, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: __pycache__/validators.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: validators, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: _format.py, is_executable: false }
PythonModuleSource { name: _format, is_package: false, is_stdlib: false, is_test: false }
File { path: _legacy_validators.py, is_executable: false }
PythonModuleSource { name: _legacy_validators, is_package: false, is_stdlib: false, is_test: false }
File { path: _reflect.py, is_executable: false }
PythonModuleSource { name: _reflect, is_package: false, is_stdlib: false, is_test: false }
File { path: _types.py, is_executable: false }
PythonModuleSource { name: _types, is_package: false, is_stdlib: false, is_test: false }
File { path: _utils.py, is_executable: false }
PythonModuleSource { name: _utils, is_package: false, is_stdlib: false, is_test: false }
File { path: _validators.py, is_executable: false }
PythonModuleSource { name: _validators, is_package: false, is_stdlib: false, is_test: false }
File { path: benchmarks/__init__.py, is_executable: false }
PythonModuleSource { name: benchmarks, is_package: true, is_stdlib: false, is_test: false }
File { path: benchmarks/__pycache__/__init__.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: benchmarks, is_package: true, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: benchmarks/__pycache__/issue232.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: benchmarks.issue232, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: benchmarks/__pycache__/json_schema_test_suite.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: benchmarks.json_schema_test_suite, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: benchmarks/issue232.py, is_executable: false }
PythonModuleSource { name: benchmarks.issue232, is_package: false, is_stdlib: false, is_test: false }
File { path: benchmarks/json_schema_test_suite.py, is_executable: false }
PythonModuleSource { name: benchmarks.json_schema_test_suite, is_package: false, is_stdlib: false, is_test: false }
File { path: cli.py, is_executable: false }
PythonModuleSource { name: cli, is_package: false, is_stdlib: false, is_test: false }
File { path: compat.py, is_executable: false }
PythonModuleSource { name: compat, is_package: false, is_stdlib: false, is_test: false }
File { path: exceptions.py, is_executable: false }
PythonModuleSource { name: exceptions, is_package: false, is_stdlib: false, is_test: false }
File { path: schemas/__init__.py, is_executable: false }
PythonModuleSource { name: schemas, is_package: true, is_stdlib: false, is_test: false }
File { path: schemas/draft3.json, is_executable: false }
File { path: schemas/draft4.json, is_executable: false }
File { path: schemas/draft6.json, is_executable: false }
File { path: schemas/draft7.json, is_executable: false }
File { path: tests/__init__.py, is_executable: false }
PythonModuleSource { name: tests, is_package: true, is_stdlib: false, is_test: false }
File { path: tests/__pycache__/__init__.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests, is_package: true, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/_helpers.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests._helpers, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/_suite.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests._suite, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/test_cli.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests.test_cli, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/test_exceptions.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests.test_exceptions, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/test_format.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests.test_format, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/test_jsonschema_test_suite.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests.test_jsonschema_test_suite, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/test_types.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests.test_types, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/__pycache__/test_validators.cpython-39.pyc, is_executable: false }
PythonModuleBytecode { name: tests.test_validators, is_package: false, is_stdlib: false, is_test: false, bytecode_level: 0 }
File { path: tests/_helpers.py, is_executable: false }
PythonModuleSource { name: tests._helpers, is_package: false, is_stdlib: false, is_test: false }
File { path: tests/_suite.py, is_executable: false }
PythonModuleSource { name: tests._suite, is_package: false, is_stdlib: false, is_test: false }
File { path: tests/test_cli.py, is_executable: false }
PythonModuleSource { name: tests.test_cli, is_package: false, is_stdlib: false, is_test: false }
File { path: tests/test_exceptions.py, is_executable: false }
PythonModuleSource { name: tests.test_exceptions, is_package: false, is_stdlib: false, is_test: false }
File { path: tests/test_format.py, is_executable: false }
PythonModuleSource { name: tests.test_format, is_package: false, is_stdlib: false, is_test: false }
File { path: tests/test_jsonschema_test_suite.py, is_executable: false }
PythonModuleSource { name: tests.test_jsonschema_test_suite, is_package: false, is_stdlib: false, is_test: false }
File { path: tests/test_types.py, is_executable: false }
PythonModuleSource { name: tests.test_types, is_package: false, is_stdlib: false, is_test: false }
File { path: tests/test_validators.py, is_executable: false }
PythonModuleSource { name: tests.test_validators, is_package: false, is_stdlib: false, is_test: false }
File { path: validators.py, is_executable: false }
PythonModuleSource { name: validators, is_package: false, is_stdlib: false, is_test: false }
PythonPackageResource { package: schemas, name: draft3.json, is_stdlib: false, is_test: false }
PythonPackageResource { package: schemas, name: draft4.json, is_stdlib: false, is_test: false }
PythonPackageResource { package: schemas, name: draft6.json, is_stdlib: false, is_test: false }
PythonPackageResource { package: schemas, name: draft7.json, is_stdlib: false, is_test: false }

So perhaps this is the root cause of the bug. If PyOxidizer is only looking for data files that are contained in folders that are ALSO valid Python packages, it will fail to load them. If I'm correct then I'd recommend adjusting this logic so that the lookup will include the files in the search regardless. This is more inline with the way setuptools and distutils work (ie: if you include support files as pacakge_data you will get a folder with no init file in it generated by the package management tool)

indygreg commented 3 years ago

Thanks for spending the time to document your findings!

I'll need to spend some time digesting what you reported. But my quick read is there is likely an edge case / bug (or 2 or 3) somewhere in resource handling. The pkgutil integration is relatively new and isn't as well tested: I wouldn't be surprised if there were some bugs there.

Ultimately we'll want to distill this failure down to a Python unit test. If you clone the repo, there are some Python unit tests for the module importer in the pyembed/ directory. https://pyoxidizer.readthedocs.io/en/stable/pyoxidizer_rust_cargo_source_checkouts.html#pyembed-crate documents how you can run the tests.

I'm not asking you to spend the time to devise that test. But if you want to keep helping with debugging, this would be a good direction to go in.

Thanks again for the report!

TheFriendlyCoder commented 3 years ago

Thanks for the quick follow up. I took a look at the unit test folder you mentioned and found a test that I think is very similar to the use case I'm hitting here. Source can be seen here. I think with some small modifications to this test it should reproduce the problem. Below is a slightly modified, and untested, unit test that I think exploits the problem:

def test_package_resource_data_only_folder(self):
        init_py = self.td / "package" / "__init__.py"
        init_py.parent.mkdir()

        with init_py.open("wb"):
            pass

        # Create an empty sub-folder containing the resource data file
        # to be loaded. This sub-folder should NOT contain an __init__.py file
        # because it is NOT a sub-package. It is a data-only folder. This mirrors
        # the behavior of setuptools when deploying package_data folders.
        data_dir = self.td / "package" / "data"
        data_dir.mkdir()

        resource = data_dir / "resource.txt"
        with resource.open("wb") as fh:
            fh.write(b"resource file")

        resources = find_resources_in_path(self.td)
        self.assertEqual(len(resources), 2)

        r = resources[0]
        self.assertIsInstance(r, PythonModuleSource)
        self.assertEqual(r.module, "package")
        self.assertTrue(r.is_package)

        r = resources[1]
        self.assertIsInstance(r, PythonPackageResource)
        self.assertEqual(r.package, "package")
        self.assertEqual(r.name, "resource.txt")
        self.assertEqual(r.data, b"resource file")

I don't have enough time just now to figure out how to get the test suite building and running locally. If I can find some time later I'll see what I can do. In the meantime, if someone on your team has a few minutes to paste this test into their local codebase to try it on my behalf, that'd be great. If it produces the output I expect then this should help you isolate the root cause of the problem I'm hitting.

dae commented 2 years ago

Edit: this is broken, see the next post.

Not a solution to the original problem, but just on the topic of packaging jsonschema:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "jsonschema", line 29, in <module>
  File "jsonschema.validators", line 349, in <module>
  File "jsonschema._utils", line 55, in load_schema
AttributeError: 'NoneType' object has no attribute 'decode'
def handle_resource(policy, resource):
    if type(resource) == "PythonModuleSource":
        if resource.name == "jsonschema":
            resource.add_location = "filesystem-relative:lib"

def make_exe():
    dist = default_python_distribution()
    policy = dist.make_python_packaging_policy()
    policy.register_resource_callback(handle_resource)
    policy.resources_location = "in-memory"
    policy.resources_location_fallback = "filesystem-relative:lib"
    python_config = dist.make_python_interpreter_config()
    python_config.run_command = "import jsonschema"
    exe = dist.to_python_executable(
        name = "pyapp",
        packaging_policy = policy,
        config = python_config,
    )
    for resource in exe.pip_install(["jsonschema"]):
        exe.add_python_resource(resource)
    return exe

def make_embedded_resources(exe):
    return exe.to_embedded_resources()

def make_install(exe):
    files = FileManifest()
    files.add_python_resource(".", exe)
    return files

register_target("exe", make_exe)
register_target("resources", make_embedded_resources, depends = ["exe"], default_build_script = True)
register_target("install", make_install, depends = ["exe"], default = True)

resolve_targets()
dae commented 2 years ago

Ok, that was allowing jsonschema to be imported, but the pkgutil.get_data() call was returning None, and imports of jsonschema submodules were broken. I ended up just forcing it to be included as unclassified files instead:

def handle_resource(policy, resource):
    if type(resource) == "PythonModuleSource":
        if resource.name.startswith("jsonschema"):
            resource.add_include = False
    elif type(resource) == "PythonPackageResource":
        if resource.package.startswith("jsonschema"):
            resource.add_include = False
    elif type(resource) == "File":
        if resource.path.startswith("jsonschema"):
            resource.add_include = True
            resource.add_location = "filesystem-relative:lib"

def make_exe():
    dist = default_python_distribution()
    policy = dist.make_python_packaging_policy()
    policy.register_resource_callback(handle_resource)
    policy.resources_location = "in-memory"
    policy.resources_location_fallback = "filesystem-relative:lib"

    policy.allow_files = True
    policy.file_scanner_emit_files = True
    policy.include_file_resources = False

    python_config = dist.make_python_interpreter_config()
    python_config.module_search_paths = ["$ORIGIN/lib"]
    python_config.run_command = "import jsonschema; import pkgutil; print(pkgutil.get_data('jsonschema', 'schemas/draft3.json'))"
    exe = dist.to_python_executable(
        name = "pyapp",
        packaging_policy = policy,
        config = python_config,
    )
    for resource in exe.pip_install(["jsonschema"]):
        exe.add_python_resource(resource)
    return exe

def make_embedded_resources(exe):
    return exe.to_embedded_resources()

def make_install(exe):
    files = FileManifest()
    files.add_python_resource(".", exe)
    return files

register_target("exe", make_exe)
register_target("resources", make_embedded_resources, depends = ["exe"], default_build_script = True)
register_target("install", make_install, depends = ["exe"], default = True)

resolve_targets()
wkschwartz commented 2 years ago

I know I'm a little late here, but I just noticed this. It's unclear to me why OxidizedFinder should be able to load data not in Python packages. In particular, pkgutil.get_data's documentation says

Get a resource from a package.... If the package cannot be located or loaded, or it uses a loader which does not support get_data, then None is returned. In particular, the loader for namespace packages does not support get_data.

If you want to load data from a filesystem directory that's not a package, I'm not sure why pkgutil.get_data is the appropriate API to use in the first place.