mliezun / caddy-snake

Caddy plugin to serve Python apps
MIT License
87 stars 3 forks source link

Is it possible to include the Python interpreter in the caddy static binary? #7

Open nickchomey opened 6 months ago

nickchomey commented 6 months ago

Frankenphp allows for including php in the single caddy go binary https://frankenphp.dev/docs/static/

Is something like this possible with Python so that an application could be deployed on a system without Python installed without using docker?

mliezun commented 6 months ago

Hi @nickchomey!

It's possible to statically link Python with this Caddy plugin, I tested it by building it with this Dockerfile:

FROM alpine:latest

WORKDIR /root

RUN apk update &&\
    apk add python3-dev go &&\
    go install github.com/caddyserver/xcaddy/cmd/xcaddy@latest &&\
    CGO_ENABLED=1 CGO_LDFLAGS="-L/usr/lib/python3.11/config-3.11-x86_64-linux-musl -static-pie" \
    XCADDY_GO_BUILD_FLAGS="-buildmode=pie -tags 'cgo netgo osusergo static_build' -ldflags \"-linkmode=external\"" \
    /root/go/bin/xcaddy build --with github.com/mliezun/caddy-snake

The resulting binary would be found in /root/caddy.

But the problem is that if you copy that binary into a different system. You'll get an error like this:

$ ./caddy-static
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = (not set)
  program name = 'caddysnake'
  isolated = 0
  environment = 1
  user site = 1
  safe_path = 0
  import site = 1
  is in build tree = 0
  stdlib dir = '/usr/lib/python3.11'
  sys._base_executable = ''
  sys.base_prefix = '/usr'
  sys.base_exec_prefix = '/usr'
  sys.platlibdir = 'lib'
  sys.executable = ''
  sys.prefix = '/usr'
  sys.exec_prefix = '/usr'
  sys.path = [
    '/usr/lib/python311.zip',
    '/usr/lib/python3.11',
    '/usr/lib/python3.11/lib-dynload',
  ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00007eff36099740 (most recent call first):
  <no Python frame>

This is because even if we have the full python interpreter inside the final binary, the python stdlib is shipped separately.

There's a possible solution were we could work around this issue by shipping the caddy+plugin binary and a .zip file that has the stdlib, but it's not trivial to implement.

May I ask, why are you interested in this solution? Maybe I can offer some other alternatives.

nickchomey commented 6 months ago

Thanks for giving it a try!

I simply like the idea of deploying an application as a single binary - as is the case with Go. Docker is great, but adds complexity that a lot of people would prefer not to have.

I have to figure that frankenphp added such functionality for similar reasons.

Hopefully some solution will present itself at some point!

mliezun commented 6 months ago

Thank you for filing the issue!

I also like the idea of being able to ship as a single binary. Plus, it's useful if you have access to a server to write files but not to install system packages.

I'd like to add this feature, just don't know what's the best way to do it at the moment

nickchomey commented 6 months ago

This is all far beyond my pay grade, but I see that frankenphp has various references to stdlib.h. Perhaps that is what you need? Edit: Probably not - you seem to be talking about the python standard library, not a c library

Also, searching around the web, I can see lots of stuff about static binaries of python-only applications. So, perhaps some inspiration can be found there?

nickchomey commented 6 months ago

For whatever it might be worth, here's what ChatGPT 3.5 says about it: https://chat.openai.com/share/fa77d1cf-93af-4f92-a389-d469ceac4d0d

It is similar to what I see in the Frankenphp docs for static builds that include the entire php app. https://frankenphp.dev/docs/embed/

nickchomey commented 6 months ago

I got this from Claude 3 Opus: The provided Dockerfile builds a Docker image that contains the Go binary with the bundled Python files. To obtain the standalone Go binary that you can execute directly, you can modify the Dockerfile to copy the binary from the Docker image to your host system.

Here's an updated Dockerfile that builds the Go binary and copies it to your current directory:

FROM alpine:latest AS build

WORKDIR /root

RUN apk update && \
    apk add python3-dev go git && \
    go install github.com/caddyserver/xcaddy/cmd/xcaddy@latest

# Copy the Python standard library
RUN mkdir -p /python-stdlib && \
    cp -r /usr/lib/python3.11 /python-stdlib/

# Build the Go binary with the bundled Python files
RUN CGO_ENABLED=1 \
    CGO_LDFLAGS="-L/usr/lib/python3.11/config-3.11-x86_64-linux-musl -static-pie" \
    XCADDY_GO_BUILD_FLAGS="-buildmode=pie -tags 'cgo netgo osusergo static_build' -ldflags \"-linkmode=external -extldflags '-static' -s -w\"" \
    /root/go/bin/xcaddy build \
    --with github.com/mliezun/caddy-snake \
    --with github.com/abiosoft/caddy-exec/cmd/caddy-exec

# Use a separate stage to copy the binary
FROM alpine:latest

COPY --from=build /root/caddy /usr/local/bin/caddy

# Copy the binary to the host system
CMD ["cp", "/usr/local/bin/caddy", "/output/caddy"]

To build the Go binary using this Dockerfile, run the following command in your terminal:

docker build -t caddy-builder .

Then, create an output directory in your current directory (if it doesn't exist already):

mkdir -p output

Finally, run the Docker container to copy the binary to your host system:

docker run --rm -v $(pwd)/output:/output caddy-builder

After running these commands, you should have a standalone caddy binary in the output directory of your current directory. You can execute this binary directly on your host system without the need for Docker.

Please note that the binary will still require the bundled Python files to be present in the /python-stdlib/python3.11 directory relative to the binary's location. You can either copy the Python files alongside the binary or modify the binary to look for the Python files in a different location by updating the PYTHONHOME environment variable.

nickchomey commented 6 months ago

And Github Copilot

Creating a standalone Go binary that includes the Python interpreter and the Python standard library is a complex task and not a typical use case. However, you can use a Dockerfile to build such a binary. Here's how you can modify your Dockerfile:

# Start from a base image with Go and Python installed
FROM golang:1.17-alpine as builder

# Install Python and Go
RUN apk update && apk add python3-dev go

# Install xcaddy
RUN go install github.com/caddyserver/xcaddy/cmd/xcaddy@latest

# Set the Current Working Directory inside the container
WORKDIR /app

# Copy go mod and sum files
COPY go.mod go.sum ./

# Download all dependencies. Dependencies will be cached if the go.mod and go.sum files are not changed
RUN go mod download

# Copy the source from the current directory to the Working Directory inside the container
COPY . .

# Build the Go app
RUN CGO_ENABLED=1 CGO_LDFLAGS="-L/usr/lib/python3.11/config-3.11-x86_64-linux-musl -static-pie" \
    XCADDY_GO_BUILD_FLAGS="-buildmode=pie -tags 'cgo netgo osusergo static_build' -ldflags '-linkmode=external'" \
    /root/go/bin/xcaddy build --with github.com/mliezun/caddy-snake

# Start a new stage from scratch
FROM alpine:latest  

RUN apk --no-cache add ca-certificates

WORKDIR /root/

# Copy the Python standard library from the builder stage
COPY --from=builder /usr/lib/python3.9 /usr/lib/python3.9

# Copy the Pre-built binary file from the previous stage
COPY --from=builder /app/caddy .

After building the Docker image with this Dockerfile, you can extract the Go binary and the Python standard library from the Docker container:

# Build the Docker image
docker build -t myapp .

# Create a Docker container from the image
docker create --name myapp-container myapp

# Copy the Go binary from the Docker container to the host
docker cp myapp-container:/root/caddy .

# Copy the Python standard library from the Docker container to the host
docker cp myapp-container:/usr/lib/python3.9 .

This will give you a Go binary (caddy) and the Python standard library (python3.9) on your host machine. You can then distribute these files together, and set the PYTHONHOME or PYTHONPATH environment variable to point to the python3.9 directory when running the Go binary.

Please note that this is a complex task that involves a deep understanding of both Go and Python. It's also not a typical use case for a static binary, and it may not be the best solution for your needs. If possible, I would recommend considering other options, such as packaging your application as a Docker container, which can include both the Go binary and the Python standard library.

nickchomey commented 6 months ago

They all seem similar enough, so it seems like this should definitely be possible. I'm happy to iterate on any of these if you find that one makes more sense/works better than the others! You could probably also ask the Frankenphp people for some guidance - they've already solved this problem so surely would have some insights (and they're very friendly)

Importantly, Frankenphp seems to rely on Go Embed, which was released in Go 1.16. The other examples don't use this. There's details in the Frankephp Embed doc as well as towards the end of the slides from this presentation

Other documentation on Go Embed:

And, looking closer at the Frankenphp docs, the dockerfile seems to ultimately just run this build script which can be run without Docker

https://github.com/dunglas/frankenphp/blob/e7e0dbfa3dcea98f2d19fd9c275324094a2610e9/build-static.sh

mliezun commented 6 months ago

I tested something equivalent to what those AIs are suggesting. Copy the contents of the python-stdlib (/usr/lib/python3.11) into a .tar.gz file and then decompress it in the target system.

It works half-way, you can actually get to execute some python code, but if you load any python package that is provided as a .so (all files under /usr/lib/python3.11/lib-dynload) instead of a .py you get an error. For example, the md5 function would be unusable in that case, you would get an error like:

ERROR:root:code for hash md5 was not found.
Traceback (most recent call last):
  File "/usr/lib/python3.11/hashlib.py", line 307, in <module>
    globals()[__func_name] = __get_hash(__func_name)
                             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/hashlib.py", line 123, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type md5
Traceback (most recent call last):
  File "/root/simple_app.py", line 2, in <module>
    from hashlib import md5
ImportError: cannot import name 'md5' from 'hashlib' (/usr/lib/python3.11/hashlib.py)

I have two ideas on how to circumvent the problem:

mliezun commented 6 months ago

There's an archived project that implements the stdlib in pure-python: https://github.com/beeware/ouroboros

Micropython seems to implement most of the stdlib: https://github.com/micropython/micropython-lib/tree/master/python-stdlib

Some other initiative like that would also be useful.

mliezun commented 6 months ago

It might be worth looking into: https://github.com/indygreg/python-build-standalone

nickchomey commented 6 months ago

Assuming that the standard library is not written in Python, it's probably not a good idea to use the ones that do that... Surely it would make an already-slow language even slower.

So, it would be ideal to find a way to embed the normal stdlib and other .so extensions. The other options you mentioned seem promising for that - particularly the last one ( https://github.com/indygreg/Python-build-standalone).

In fact, looking closer, Frankenphp takes the same approach in the build-static.sh script I linked to previously. It leverages https://github.com/crazywhalecc/static-php-cli to build a standalone static php binary with all its extensions (which are also .so files).

Here's another good article about how this approach is being used, not just by frankenphp but various other php frameworks and servers. https://www.bosunegberinde.com/articles/building-php-binary

Also, again, it seems that using Go Embed (used by frankenphp) rather than the Docker approaches suggested by the LLMs would be a better approach as it would allow for building all of this without the overhead and complexity of Docker.

nickchomey commented 6 months ago

Maybe we're overthinking it.

It seems like PyInstaller already does what we're looking for, is very mature (v1 was in 2005!), well maintained, and has lots of tooling. So, why recreate the wheel?

Here's how it works. https://pyinstaller.org/en/stable/operating-mode.html#analysis-finding-the-files-your-program-needs

Apparently it scans the entire python application, finds its dependencies, and copies them all into a single directory. To distribute it, you just zip the file and send it to someone to unzip and execute.

That seems less ideal than just a single static executable binary, but might be perfect for our purposes where we don't really want/need a Python binary within a Go binary - instead, we could just include the unzipped directory in the Go binary using Go Embed.

That's what frankenphp essentially does - uses php-static-cli to build the static dependencies and then copies the application code to be adjacent to it, and includes it all in the go binary.

It seems almost too simple...

(edit: I now see that it also has a single executable file mode as well, but they say it's both slower and harder to debug. It probably doesn't matter much to Go Embed though - perhaps we could have a flag that allows for building it with either method)

nickchomey commented 6 months ago

It turns out there's lots of alternatives to PyInstaller. Here's a good comparison written up by the dev of one of the more promising ones, PyOxidizer. (same dev as Python Build Standalone, which you linked to in your most recent message. And oxidizer uses that under the hood. ) https://pyoxidizer.readthedocs.io/en/stable/pyoxidizer_comparisons.html and another more detailed technical article by him https://gregoryszorc.com/blog/2018/12/18/distributing-standalone-python-applications/

It seems like PyInstaller, PyOxidizer and Nuitka are the best options.

Nuitka might be the most performant, as it compiles everything to machine code. But it is slow to compile - very antithetical to Go development. He also made a good point that maybe we shouldn't trust nuitka's conversion and compilation... PyInstaller with the single directory method might be ideal for development and debugging.

Perhaps both could be supported through flags - a dev/debug mode that uses PyInstaller and then nuitka or PyOxidizer for production deployment?

Or perhaps better to just keep it simple, do dev/debugging on the Python app with standard tooling, and just use PyOxidizer with caddy snake?

nickchomey commented 6 months ago

Nevermind, PyOxidizer was indefinitely deprecated a couple weeks ago. Though, hopefully a new maintainer will appear. But Python Standalone Build will be maintained! https://gregoryszorc.com/blog/2024/03/17/my-shifting-open-source-priorities/

On a brighter note, though, this new lead to the discovery of a new and even more promising option: https://github.com/astral-sh/rye

It is built upon Python standalone build, was started a year ago with a solo dev - here's some backstory. https://lucumr.pocoo.org/2024/2/4/rye-a-vision/

Then astral, a well backed Python tooling company, took it over less than 2 months ago, concurrent with the release of a seemingly excellent package manager https://astral.sh/blog/uv

It seems like it would be prudent for caddy snake to embrace the high quality and extreme momentum of this new tooling!

Having said all of that... It appears that rye is somewhat of a "framework", in the sense that you need to use it (which now uses uv under the hood) to set up your dependencies etc... That's probably not universally appealing at this point - existing projects are stuck with poetry, pdm, pip, pyenv, and so many other tools.

So, maybe we're back to relying on Python Standalone Build to supply any application with a static Python binary? Its dev reaffirmed it will be maintained going forward, on the basis that has become too important to abandon - with rye and others building upon it.

If caddy snake uses that, then perhaps someday it'll be easy to move to and/or support direct Rye (or other tool) usage...? After all, it seems like all we really need is to Go Embed a self-contained directory or binary, however it is built...

Still, if you're interested in getting Frankenphp-like attention, I think there's a lot of possibilities for that if you make it compatible with these leading tools.

An extra bonus for you is that all of these tools (Python standalone build, rye, uv), are written in Rust, which your blog says you've been keen to learn!

mliezun commented 6 months ago

That all seems cool!

I tested with pyinstaller, you can generate a folder with the needed files with this command pyinstaller -d noarchive main.py. Where the Python file is the one that you will use as your entrypoint for caddy.

Then changed the init code here https://github.com/mliezun/caddy-snake/blob/main/caddysnake.c#L382, by adding the following lines:

  config.module_search_paths_set = 1;
  config.site_import = 0;
  wchar_t *paths[] = {
      L"/path/to/folder/base_library.zip",
      L"/path/to/folder/lib-dynload",
      L"/path/to/folder/"};
  int num_paths = sizeof(paths) / sizeof(paths[0]);
  // Set the module search paths
  status = PyConfig_SetWideStringList(&config, &config.module_search_paths,
                                      num_paths, paths);
  if (PyStatus_Exception(status)) {
    goto exception;
  }

That changes the sys.path to make sure it's loading from the folder generated by pyinstaller. Luckily it worked fine this time and I was able to caddy with the plugin.

Going one step further would be embedding that folder with fs.Embed and extracting into a temporary folder, then using the path from the temp folder in C to load the correct libraries.

I'm not sure if I want to integrate this into the plugin in it's current state. I think it would make more sense when it's more mature and the interface for creating static binaries is better defined.

nickchomey commented 6 months ago

Cool stuff! I look forward to trying it out!

Just for the sake of thoroughness, here's a couple more tools that I found that are worth consideration.

https://github.com/pypa/hatch - similar to rye, and built on top of Python Standalone builds

https://github.com/ofek/pyapp - by the same dev, creates a binary of a Python project.

mliezun commented 6 months ago

Cool!

I think I would prefer the plugin to work similarly to python standalone builds. In which the libraries are linked together with the final binary.

But if you want to produce a single binary now, the above instructions should work. Also, make sure your project imports packages: queue, threading. Otherwise pyinstaller will not bundle them and they're needed internally by caddy-snake.

nickchomey commented 6 months ago

I like the idea of Rye (using Python Standalone builds) and UV quite a lot, so it would be great to have Caddy be able to load a rye application directly. Anyway, I'll probably try the various approaches when I get back to focusing the Python aspect of my project. Thanks for being open to exploring all of this and laying the groundwork!

mliezun commented 6 months ago

Of course! Thank you for your interest and collaboration sorting this out 😉

nickchomey commented 6 months ago

For whatever it is worth, here's an issue I just stumbled upon where lots of people have been discussing how to implement Rye with Docker. There might be something useful for here - even though a Docker-less Go Embed version is what we're after.

https://github.com/astral-sh/rye/discussions/239#discussioncomment-8966441

mliezun commented 6 months ago

I think Rye works at a different stage, we're looking for something that will let us produce a single binary, and Rye is trying to bundle a bunch of different tools for project management in one place. It's still an awesome tool and we should definitely support projects that use it. But it's not related to static binaries as far as I can tell.

nickchomey commented 6 months ago

Its entirely possible that I'm still not appreciating what needs to happen, but its not clear to me that we need a single binary of the python app. I believe that the end goal is really just to have a single Go binary with Caddy + Caddy Snake + the python application (and also, in my case, Frankenphp + some NATS stuff etc...).

Using Go Embed, we can include whatever files/directories we want into the final Go binary. So, so long as those files/directories contain a self-contained python application, then it should work.

You already showed that it worked with a PyInstaller directory, rather than just an executable. Rye (and many of the other tools I linked to) would presumably result in the same thing, as it uses Python Build Standalone to have a static python stdlib etc... rather than whatever PyInstaller includes in its directory.

Though if you want just a single binary, then PyApp (from the dev of Hatch) seems like a good option.

But, now that I think about it some more, it seems to me that Caddy Snake doesn't really need to have an opinion on which build/publish tool is used... Rather, it shouldn't be all that difficult to support any build tool by just providing a parameter for the path to the binary/self-contained python app directory. Perhaps a flag might be needed to distinguish between executing a binary in the path or executing a main.py file in the path, which would then use all the supporting files in the directory.

Does that make sense? Am I missing something fundamental?

mliezun commented 6 months ago

We're on the same page about being able to produce a single binary with caddy and any plugins you want inside.

But AFAIK Rye is just copying the python standalone binary and using some tools around it. It doesn't compile or do anything else.

What we need to do is to setup a build process similar to what python standalone does and integrate it with go and caddy.

nickchomey commented 6 months ago

My impression is that Rye doesn't need any build step, just as there's no build step for a normal Python app. The difference is that normal Python runs off of the Python interpreter/stdlib installed system wide, and rye includes it within the project directory via Python Standalone Build.

In that sense, it is just like PyInstaller - you have a self contained, self executable directory for a Python app. Likewise for the various other similar tools I mentioned above.

As such, cant either of them simply be Go Embedded in Caddy Snake?

If that's the case, perhaps caddy snake can then just remain altogether agnostic to which tool (PyInstaller, rye, pyapp, etc...) is used to make the standalone/executable Python app. Some produce a self contained directory, others create a single executable binary. Whatever the case, caddy snake's wsgi (and hopefully asgi someday!) server will simply point to the specified binary or main.py.

(Though, it seems to me that embedding a directory with a main.py would be preferable to a Python binary as it could probably be more easily debugged etc... But it should be easy to support both via a flag)

Is there anything about that that isn't clear or seems fundamentally incorrect? Maybe I still just misunderstand how Caddy Snake works?!

mliezun commented 6 months ago

As such, cant either of them simply be Go Embedded in Caddy Snake?

With Go embed you can embed any file you want inside the final binary. Whether that's something we want or not in this case is up for discussion.

Is there anything about that that isn't clear or seems fundamentally incorrect? Maybe I still just misunderstand how Caddy Snake works?!

We're not bundling the Python executable/binary inside Caddy. What we're doing is linking the resulting binary so that it loads libpython.so on runtime and can call Python code/functions directly from Caddy, without needing to start a new process. All occurs within the same unique OS-level process.

My take is that I don't know at this point what the "correct" way of generating single binaries should be. And I think there's still a lot of work to do in this plugin before we get to that point.

nickchomey commented 6 months ago

Ok, I'll happily defer to your expertise.

But I'll just throw another interesting option into the works...

Cloudflare just enabled running Python in their (typically JavaScript) workers.

https://blog.cloudflare.com/python-workers

It uses pyodide, a wasm port of Python interpreter, stdlib, etc... It seems to allow for running asgi apps (eg fastapi) directly in their workers.

Might be worth exploring for some inspiration?

mliezun commented 6 months ago

That's awesome! Definitely worth a look.

We could add a worker mode in which we use pyodide to run apps. The user could then select to run in "normal" or "worker" mode using a config in Caddyfile.

I think we still should let everyone use standard Python for maximum compatibility.

nickchomey commented 6 months ago

What about this Go package, which uses Python Build Standalone, go:embed, etc...?

https://github.com/kluctl/go-embed-python

If not appropriate here, I may pursue it myself because, as it turns out, I mainly just want to package the python app in a Go binary as I don't think that I need any http support at all - I'm just using Python for a backend service and I'm already using NATS to do pubsub stuff. Therefore, I can just trigger "requests" through either the nats.go client or nats.py client.

mliezun commented 6 months ago

That just downloads the full python binary. It doesn't apply to our case

nickchomey commented 6 months ago

Sorry. I sense that I'm more of a distraction than a help at this point. I hope I was at least somewhat helpful through all of this though. I'll watch the repo for future progress!

mliezun commented 6 months ago

No worries! Is good to hear some ideas 👌

codablock commented 6 months ago

Hello :) I'm the maintainer of the mentioned go-embed-python project and I'm searching for mentions of it from time to time and found this issue. Just wanted to clarify that go-embed-python is downloading the Python binary at the time of package release and actually committing it to the release tag. This means, the binaries + Python libraries are fully embedded if you add go-embed-python as a dependency.

nickchomey commented 6 months ago

@codablock thanks for the clarification! If you have a chance to review this issue and the project readme, do you think your package would be a good option to achieve the goal here of creating a single go binary that contains the Caddy web server which can internally route to a fully embedded Python application (such that you wouldn't need a separate server like gunicorn, uvicorn, etc...)?