prefix-dev / pixi

Package management made easy
https://pixi.sh
BSD 3-Clause "New" or "Revised" License
3.16k stars 172 forks source link

`mlflow` and `streamlit` take up a lot more memory when added as feature dependencies as opposed to `pypi-dependencies` #2214

Open AvishekMondalQC opened 2 weeks ago

AvishekMondalQC commented 2 weeks ago

Checks

Reproducible example

[project]
name="test-project"
channels=["conda-forge"]
platforms=["linux-64"]
conda-pypi-map={}

[system-requirements]
libc = "2.26"

[feature.dev.dependencies]
python = "3.11"
setuptools = "70.0.0"
pip = "24.0.*"
streamlit = "1.35.0"
mlflow = "2.13.1"

[environments]
default = ["dev"]

With the above pixi.toml file, when I run pixi install and monitor memory usage using htop, I see that RAM goes beyond 5GB. But when I run pixi install using the following pixi.toml file and monitor RAM using htop, it does not exceed 2GB

[project]
name="test-project"
channels=["conda-forge"]
platforms=["linux-64"]
conda-pypi-map={}

[system-requirements]
libc = "2.26"

[feature.dev.dependencies]
python = "3.11"
setuptools = "70.0.0"
pip = "24.0.*"
# streamlit = "1.35.0"
# mlflow = "2.13.1"

[feature.dev.pypi-dependencies]
streamlit = "==1.35.0"
mlflow = "==2.13.1"

[environments]
default = ["dev"]

Issue description

I do not understand why there is such a discrepancy in memory usage when downloading the same packages from conda vs when getting if from pypi. How would I go about tracking the source of the discrepancy?

Additionally, is it possible to specify the maximum amount of memory or number of workers the pixi install process uses?

Expected behavior

I would expect the memory usage of pixi install to be roughly the same for both pixi.toml files.

ruben-arts commented 2 weeks ago

The issue is related to our solvers, when using the pypi dependencies for those package you'll only have to solve them against a known python version, which comes from the conda solve. Adding the package to the conda solve results in testing much more options in the same conda solve. Resulting in one big solve instead of two serial small solves.

We'll never be able to create the same amount of memory being used for one big solve than two smaller solves but the memory concern is valid.

I'll as @baszalmstra if he has any ideas on how to implement a memory limit, as I'm unsure how that would work.