Open e10v opened 11 months ago
Actually different module name is useful not only for packages but for other projects. Suppose, I have a project repo with polars as a dependency. I can execute it my modern laptop, as well as on server with old Xeon CPUs (which works only with polars-lts-cpu). So I would have the same issue.
Makes total sense. I have a similar problem where I deploy to an older production server having a xeon processor. My local computer is more modern and used polars without any problem, but now when deploying in production, I get an exception saying the processor is missing features.
I got a new problem related to this. As soon as you include a package that lists polars as one of it's dependencies in your own dependencies, this is going to cause problems.
I had dagster-duckdb-polars package in my projet dependencies. dagster-duckdb-pandas has a direct dependency on polars[pyarrow].
To get around the problem of polars vs polars-lts-cpu in production, I had defined [project.optional-dependencies] in my pyproject.toml, and I was installing polars only in dev and polars-lts-cpu only in production.
But as soon as I included dagster-duckdb-polars, since it depends on polars[pyarrow], it would also install polars in production no matter what optional dependencies I told it to install making my project crash in production. Not too sure what would be the best way to address that.
Also running into this issue. Listing polars as a dependency in a package seems to be an oversight based off of how different x86 binaries for newer/older cpu's are given completely different package names. Potentially the answer from the polars team is that you shouldn't make python packages which depend on polars, only using polars as an end-user. Or you should only compile from source when deploying your package onto an old Xeon server.
The odd thing is it seems other libraries (e.g. numpy) have solved this problem through runtime-dispatching. There is likely a reason I do not understand as to why this solution was not advanced.
Another solution which is out of scope from polars (more of a python software foundation request) is adding cpu feature flags to PEP 508. I personally do not like this idea, as I find the solution of making multiple packages for different minimum cpu extensions to add a lot of complexity for a package which depends on polars.
My preferred solution to having a package support a range of cpu features would be to make different wheel files with different 'platform tags' for different levels of cpu extensions. PEP 491 and documentation on platform compatibility tags give a guide to the current state of how this works, and what would need to change to support this. Essentially the 'platform tags' would need to allow for more specificity than just cpu architecture + OS, also including flags for required cpu architecture extensions. The benefit of this solution over runtime dispatching is smaller binaries for end-users.
One could see this problem (older cpu's which do not support the minimum cpu extensions) as one slowly going away as the number of x86 cpu's without the minimum x86 extensions is diminishing by the day. However, I would guess this will likely be a continuous problem, as computing architectures continue to add extensions to archive greater performance and software libraries aim to conditionally use such extensions based on their presence.
Potentially the answer from the polars team is that you shouldn't make python packages which depend on polars, only using polars as an end-user.
I agree that this is the current state of the "installation story" for Polars, but it's not made clear (and as I've seen noted elsewhere, to do so would mean that installing such a package would result in non-functional software, i.e. not "batteries included").
This is rather a headache for portability 😞
Just commenting to let you know we are aware of this issue. However, there is not a clear, easy solution here. Therefore, this has low priority for us right now.
Package authors with a Polars dependency can choose to compile multiple versions of their package just like Polars does. We realize this is not great, but it's probably your best bet if you want to support old CPUs.
I think this is ideally fixed upstream as a PEP. I've encountered this before on other packages.
What do you think of proposing a default optional dependency so that the following holds true:
pip install my-package
-> pip install 'my-package[cpu]'
pip install 'my-package[lts-cpu]'
-> pip install 'my-package[lts-cpu]'
and not [cpu]
[project]
name = "my-package"
description = "A package that depends on polars, but wants support for the regular and lts cpu versions"
default-optional-dependency = "cpu"
[project.dependencies]
# None
[project.optional-dependencies]
cpu = ["polars"]
lts-cpu = ["polars-lts-cpu"]
Hey 👋🏼
We had the same discussion on pdm
.
I just share a possible way of handling this (copy-pasted from my post in the discussion)
IMHO, the proper solution for polars would be:
polars
package with lts-cpu
and u64-idx
extras depending on polars-lts-cpu
and polars-u64-idx
polars-lts-cpu
and polars-u64-idx
but ensure they do not overwrite the polars
package like this is the case todaypolars
package detect available supports and dynamically import the proper support at runtimeIt can be something basic like:
try:
import polars_lts_cpu.feature as feature
except ImportError:
import polars.feature as feature
def public_function():
return feature.do_something()
[!NOTE] I believe it's even possible to use
importlib.metadata
andentrypoints
to avoidtry/except
and be able to be more dynamic on support loading at runtime
This way, package depending on polars
just need to add polars
as dependency.
End user installations can specify polars
, polars[lts-cpu]
or polars[u64-idx]
, it will always be resolved as polars
but will install the expected extra CPU support and properly resolve it at runtime.
Another (untested) option might be to keep CPU-specific versions of polars as "implementation" packages and make a single polars
package that is only distributed as a source distribution in PyPI (not as a wheel since wheels pre-run setup.py and do not include it in the package). Its setup.py can dynamically detect CPU features and make itself depend on the correct internal implementation. The main polars
package can then simply re-export the implementation package that was installed.
If for some reason a user requires a specific implementation, they can always depend on it directly and use the corresponding imports rather than using the generic polars
shim.
Handling polars-u64-idx
can also be done via e.g. polars[u64-idx]
, so that setup.py becomes the central decision point that can raise if the combination that is required does not exist.
Another (untested) option might be to keep CPU-specific versions of polars as "implementation" packages and make a single
polars
package that is only distributed as a source distribution in PyPI (not as a wheel since wheels pre-run setup.py and do not include it in the package). Its setup.py can dynamically detect CPU features and make itself depend on the correct internal implementation. The mainpolars
package can then simply re-export the implementation package that was installed.If for some reason a user requires a specific implementation, they can always depend on it directly and use the corresponding imports rather than using the generic
polars
shim.Handling
polars-u64-idx
can also be done via e.g.polars[u64-idx]
, so that setup.py becomes the central decision point that can raise if the combination that is required does not exist.
Industry standard is to provide binaries for any major python package. The user expectation is that “pip install polars”, “just works” on Mac, Linux, and Windows, for x86 & arm.
Users can already do what you are describing (build binaries themselves specific to their os/cpu instruction set)
The only part that would be shipped as sdist is the shim. The implementation packages would obviously come prebuilt in this scenario, there is no reason to let the user deal with that.
EDIT:
Users can already do what you are describing (build binaries themselves specific to their os/cpu instruction set)
This is not what I'm describing. The proposition includes the following packages:
polars
: top-level shim, people should normally only depend on thatpolars-lts-cpu
: same as currentpolars-recent-cpu
: same as what currently ships as polars
polars-lts-cpu-u64-idx
: you get the ideaThe polars
top-level shim would not be shipped as a wheel, to preserve the execution of setup.py
on the user machine. That setup.py
would detect what instructions the machine support, pick one of the appropriate implementation packages and depend on it.
In that scenario, I think it's a better idea if each of the "implementation packages" use different import names (e.g. import polars_lts_cpu
) and let the shim package re-export it. The shim can have a priority order from most specialized to least specialized implementations, and/or an env var to let the end user pick one of the installed implementations. That would allow users who really need direct access to the implementation to get it, and would allow multiple implementations to be installed at once rather than overwriting files and letting pip confused. As it stands, what you get by import polars
after having run both pip install polars
and pip install polars-lts-cpu
depends on the state of the venv before running those commands, which is clearly not good.
Narwhals is interesting and I plan on giving it a go, but is narwhals fixing any of the issues discussed here ? AFAICT it simply depends on the polars
package.
is narwhals fixing any of the issues discussed here ?
Narwhals fixes only the problem of package dependencies. But a user still has to install Polars. The problem with installing project dependencies both on modern laptop and on server with old CPUs stands.
AFAICT it simply depends on the polars package.
Polars is an optional dependency. A user can install Narwhals and polars-lts-cpu.
Description
I'm planning to develop a new package which has polars as a dependency. There is also polars-lts-cpu package for old processors. The issue is that they cannot be installed at the same time because of conflicting module name. While one module name is convenient for project repositories, it's not at all convenient for developing packages which depend on polars.
I've searched and found similar issue on stackoverflow. It seems like the only option is to add both polars and polars-lts-cpu as optional dependencies. That means that if users will have a broken package if they install it without optional dependencies.
Could you reconsider your solution for old processors? Module name
polars_lts_cpu
would be a better choice from my point of view. Any user could just import it that way:And the dependency problem would be solved.