pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.19k stars 1.84k forks source link

Cross-architecture support #12898

Open lorenzwalthert opened 9 months ago

lorenzwalthert commented 9 months ago

Description

Docker images built for linux can now be run on Apple Silicon thanks to a Rosetta Integration promoted to General Availability recently. I think this works out of the box for pure Python packages. With Polars, building for target architecture linux/amd64 and running on Apple Silicon linux/arm64/v8 seems to fail.

My Dockerfile:

FROM python:3.11.4

RUN pip install polars

My script to build the container and import polars:

$ docker build --platform=linux/amd64 -t polars-amd64 .
$ docker run -it polars-amd64 /bin/bash
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
$ python -c "import polars"
Illegal instruction

gives Illegal instruction at the last line. Packages like pandas can be properly imported if you amend the Dockerfile.

It would simplify our development and build processes if we could run our AMD builds on Apple Silicon. I have no idea what that would involve or if it is even possible for polars.

FWIW, my employer, Ponte Energy Partners, is a Silver company sponsor of @ritchie46.

akdor1154 commented 9 months ago

(not a contributor, just some guy) Try polars-lts-cpu?

lorenzwalthert commented 9 months ago

Thanks for the suggestion. Reading the README to the end would have been a good thing for me to do:

Do you want Polars to run on an old CPU (e.g. dating from before 2011), or on an x86-64 build of Python on Apple Silicon under Rosetta? Install pip install polars-lts-cpu. This version of Polars is compiled without AVX target features.

I will try that. If the performance drop is significant, that might be a solution.

akdor1154 commented 9 months ago

I will try that. If the performance drop is I significant, that might be a solution.

If the performance drop is significant the first thing I'd be doing is getting Rosetta out of the picture and letting local devs run native arm containers :p