python-poetry / poetry

Python packaging and dependency management made easy
https://python-poetry.org
MIT License
31.12k stars 2.25k forks source link

Build fails when project exists in a Docker-created volume mount directory #2895

Closed EvanShenkman-Sonos closed 3 years ago

EvanShenkman-Sonos commented 4 years ago
My pyproject.toml file ```toml [tool.poetry] name = "my_package" version = "0.1.0" description = "My protobuf generated package" authors = ["First Last "] license = "Proprietary" packages = [ { include = "my_namespace" } ] [tool.poetry.dependencies] python = "^3.8" [tool.poetry.dev-dependencies] [build-system] requires = ["poetry>=0.12"] build-backend = "poetry.masonry.api" ```
My protoc generated directory structure ```bash ➜ tree output/ output/ ├── pyproject.toml └── my_namespace └── my_package └── v1 ├── bar_pb2.py ├── baz_pb2.py └── foo_pb2.py ```

Issue

⚠️ Disclaimer: this may very well be an issue with docker, not poetry ⚠️

I'm trying to use poetry to build and publish a package that consists entirely of source code generated by protocol buffer definitions. I use a docker image that contains buf and protoc to generate the python source. The output directory is created by the host. The namespace directory, package directory, and source files are created by the container. The entire project directory, which includes the output directory, is volume mounted to the container.

➜ mkdir output
➜ docker run --rm \
    --entrypoint buf \
    -v (pwd):/apis \
    -w /apis \
    my-protoc-image \
    protoc --python_out="output" $(find . -name '*.proto')

After the docker run command, I have the source files described above. I then copy the pyproject.toml file from above into the output directory. I am then unable to successfully run poetry build inside of the output directory.

➜ cd output/
➜ poetry build
Skipping virtualenv creation, as specified in config file.
Building my_package (0.1.0)
 - Building sdist
 - Built my_package-0.1.0.tar.gz

[ValueError]
/tmp/tmphmvi0_jg/my_package-0.1.0/my_namespace does not contain any element

I have made the following observations on both MacOS and Linux Mint...

  1. The sdist archive does not contain the my_namespace directory or subdirectories.

    ➜ cd dist
    ➜ tar -xvf my_package-0.1.0.tar.gz
    my_package-0.1.0/pyproject.toml
    my_package-0.1.0/setup.py
    my_package-0.1.0/PKG-INFO
  2. The wheel build fails entirely with an interesting message that points towards a potential docker-poetry interoperability issue...

    • MacOS

      [ValueError]
      /var/folders/0r/3fcft8hd7kx12d_ydz88rs0r30dsjv/T/tmpl2io5rd5/my_package-0.1.0/my_namespace does not contain any element
    • Linux Mint

      [ValueError]
      /tmp/tmplxr0fas4/my_package-0.1.0/my_namespace does not contain any element

If I recursively copy the entire output directory the problem goes away.

➜ cp -r output/ output_2
➜ cd output_2/
➜ poetry build
Skipping virtualenv creation, as specified in config file.
Building my_package (0.1.0)
 - Building sdist
 - Built my_package-0.1.0.tar.gz

 - Building wheel
 - Built my_package-0.1.0-py3-none-any.whl
➜ cd dist/
➜ tar -xvf my_package-0.1.0.tar.gz 
my_package-0.1.0/pyproject.toml
my_package-0.1.0/my_namespace/my_package/__init__.py
my_package-0.1.0/my_namespace/my_package/v1/__init__.py
my_package-0.1.0/my_namespace/my_package/v1/bar_pb2.py
my_package-0.1.0/my_namespace/my_package/v1/baz_pb2.py
my_package-0.1.0/my_namespace/my_package/v1/foo_pb2.py
my_package-0.1.0/setup.py
my_package-0.1.0/PKG-INFO

I am no Docker expert, but it seems like poetry somehow resolves the directories in question to the host's temporary volume mount paths that should only be visible to the Docker engine. I was surprised to find that the problem exists in both MacOS and Linux.

How does poetry go about resolving paths during builds? How is it that poetry decided to look in /var/folders or /tmp instead of the working directory while building?



Thanks in advance and for the work y'all have done to make poetry such a fantastic tool 🍻

abn commented 4 years ago

@EvanShenkman-Sonos appreciate the kind words and also the detailed issue descriptions.

First thing that comes to mind is that this could couple of issues.

  1. File permissions, you might want to use something like this docker run --user "$(id -u):$(id -g)" ... when running this so that the file created ends up being accessible by the user.
  2. Docker storage backend: This should not really be a problem typically, however if your storage backend is non-standard, you might end up confusing pathlib's Path.resolve().

The temp path usage is expected; since when doing a complete build (sdist + wheel) poetry does create these in isolation. And this happens in a temp directory. You can try poetry build -f wheel. Another thing to note is to make sure that your include and package configurations are correct. It could be that when the isolated build happens these generated files are not copied over prior to the build step.

Related: You could also check this project for an example of poetry manged project that generates python code from protobuf - https://github.com/python-gnxi/python-gnmi-proto

EvanShenkman-Sonos commented 4 years ago

@abn - thanks for the quick reply!

  1. Yup, after figuring out it wasn't an OS-issue, permissions was the next thing I tried. Even after recursively changing ownership and group-ownership, poetry build behaves the same way. Inside of the output directory, I am able to step-into a python interpreter and import and use the generated python source with or without changing the permissions. I don't think permissions are the problem.

  2. I don't think I've changed Docker's storage backend. Below is the output of docker info. The Storage Driver field looks like the default.

Output of docker info ```bash > docker info Client: Debug Mode: false Server: Containers: 104 Running: 0 Paused: 0 Stopped: 104 Images: 319 Server Version: 19.03.12 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd init version: fec3683 Security Options: apparmor seccomp Profile: default Kernel Version: 4.15.0-54-generic Operating System: Linux Mint 19.2 OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 31.31GiB Name: yavin ID: 4NPL:KC5T:LJQK:7ZDU:4TK5:FPC7:XVMK:N5N6:V33E:FM57:Q6K2:REAZ Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false WARNING: No swap limit support ```

Running poetry build -f wheel runs successfully, however it suffers the same problem as the sdist build. If I try to install the wheel, I'm unable to import the namespace or the package and I don't see the directories in site-packages.

Since the copy of the output directory works, I don't think my include or package configurations are incorrect. With the copy I'm not changing the directory structure or any of the file contents and poetry build works as expected.

Tomorrow I will try to follow the same procedure I described initially with the only difference being that I'll run protoc on the host, not within a container. I'll report back- thanks again!

EvanShenkman-Sonos commented 4 years ago

@abn -- The more I dig into this, the more I'm convinced it has nothing to do with Docker and it's most likely a user-error on my part with how I'm configuring poetry.

I ran the same protoc commands on my host, rather than within a container, creating the same output directory, it's subdirectories, and python source. I'm running into the same issues: poetry build -f sdist and poetry build -f wheel both result in empty packages. So it's definitely not a permissions issue nor is it a Docker storage backend issue.

After some more hacking, I was able to get poetry build working, but I'm unsure about why what I've done works. Maybe you can shed some light.


Here is my updated project structure and pyproject.toml file...

Project structure ```bash ➜ tree . . ├── README.md ├── pyproject.toml └── my_namespace └── my_package └── v1 ├── bar.proto ├── baz.proto └── foo.proto ```
pyproject.toml ```toml [tool.poetry] name = "my_package" version = "0.1.0" description = "Protobuf generated APIs" authors = ["First Last "] license = "Proprietary" packages = [ { include = "my_namespace", from = "output" } ] [tool.poetry.dependencies] python = "^3.8" [tool.poetry.dev-dependencies] [build-system] requires = ["poetry>=0.12"] build-backend = "poetry.masonry.api" ```

After looking through python-gnmi-proto repository, I decided to install the betterproto library and use it to generate the python source.

➜ mkdir output
➜ protoc --proto_path=. --python_betterproto_out=output (find . -name '*.proto')
Writing __init__.py
Writing my_namespace/__init__.py
Writing my_namespace/my_package/__init__.py
Writing my_namespace/my_package/v1.py
➜ tree output
output
├── __init__.py
└── my_namespace
    ├── __init__.py
    └── my_package
        ├── __init__.py
        └── v1.py

In this case, running poetry build still doesn't produce the packages correctly...

➜ poetry build -f sdist
Skipping virtualenv creation, as specified in config file.
Building my_package (0.1.0)
 - Building sdist
 - Built my_package-0.1.0.tar.gz
➜ cd dist
➜ tar -xvf my_package-0.1.0.tar.gz
x my_package-0.1.0/pyproject.toml
x my_package-0.1.0/setup.py
x my_package-0.1.0/PKG-INFO

However, modify my packages configuration (dropping the from part) to...

packages = [
    { include = "my_namespace" }
]

And generate the python source to be side-by-side with the proto tree...

➜ protoc --proto_path=. --python_betterproto_out=. (find . -name '*.proto')
Writing __init__.py
Writing my_namespace/__init__.py
Writing my_namespace/my_package/__init__.py
Writing my_namespace/my_package/v1.py
➜ tree my_namespace
my_namespace
├── __init__.py
└── my_package
    ├── __init__.py
    ├── v1
    │   ├── bar.proto
    │   ├── baz.proto
    │   └── foo.proto
    └── v1.py

Poetry can successfully build the package...

➜ poetry build -f sdist
Skipping virtualenv creation, as specified in config file.
Building my_package (0.1.0)
 - Building sdist
 - Built my_package-0.1.0.tar.gz
➜ cd dist
➜ tar -xvf my_package-0.1.0.tar.gz
x my_package-0.1.0/pyproject.toml
x my_package-0.1.0/my_namespace/__init__.py
x my_package-0.1.0/my_namespace/my_package/__init__.py
x my_package-0.1.0/my_namespace/my_package/v1/bar.proto
x my_package-0.1.0/my_namespace/my_package/v1/baz.proto
x my_package-0.1.0/my_namespace/my_package/v1/foo.proto
x my_package-0.1.0/my_namespace/my_package/v1.py
x my_package-0.1.0/setup.py
x my_package-0.1.0/PKG-INFO

This isn't ideal since the distribution contains the proto files.

I'm either not structuring my directories in a way that plays nicely with poetry, or I don't fully understand the nuances of the packages configuration option. Any insight would be awesome! Thanks!

EvanShenkman-Sonos commented 4 years ago

I ended up using include and exclude glob patterns to remove the proto files from the distribution archive. I think this is an acceptable solution.

I would still be interested in learning about the root-cause of the package configuration issue I described above.

brettdh commented 3 years ago

@EvanShenkman-Sonos Did you ever figure out the reasons this happens? For my part, I noticed a couple weird things:

  1. I'm trying to build my package as an sdist as part of a larger CMake project, so I'm copying the python project into a CMake build directory. When I run poetry build -f sdist -n in the build dir, I get the almost-empty sdist you described above. However, when I run it in the source dir, I get the correct sdist with all my source files included.
  2. In the build dir, if I extract the almost-empty sdist tarball, copy the setup.py into the right spot, and run python setup.py sdist in my virtualenv, all my source files get included.
brettdh commented 3 years ago

D'oh - turns out the problem in my case was due to the aforementioned cmake interaction. The poetry build command is running in a tree that's .gitignored, and poetry excludes any VCS-ignored files from inclusion in any build. A simple workaround of building in the source tree and moving it out to the build tree should suffice.

EvanShenkman-Sonos commented 3 years ago

@brettdh - I got burned by the same exact thing: poetry was ignoring my source files because of my .gitignore. Moving stuff around should work. Like I mentioned in my last comment, I was able to override that unwanted behavior by explicitly using an include glob in my pyproject.toml file. Hopefully this helps others if they run into a similar issue!


Link to relevant section poetry docs: https://python-poetry.org/docs/pyproject/#include-and-exclude

Root-cause:

If a VCS is being used for a package, the exclude field will be seeded with the VCS’ ignore settings (.gitignore for git for example).


I'm going to close this issue.

github-actions[bot] commented 6 months ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.