2.0 regression: large overhead of `libsolv`'s `solver_unifyrules` when multichannels are used

mamba-org / mamba

The Fast Cross-Platform Package Manager

https://mamba.readthedocs.io

BSD 3-Clause "New" or "Revised" License

6.9k stars 355 forks source link

2.0 regression: large overhead of `libsolv`'s `solver_unifyrules` when multichannels are used #3393

Closed Hind-M closed 2 weeks ago

Hind-M commented 3 months ago

From @ndevenish in the QS lobby on gitter: " Is there any known issues with current micromamba about resource usage, possibly related to Centos/RHEL? I've had two separate people come to me this week with issues with: a) using micromamba in a container build dying because it filled their entire temp disk (when installing very few packages). b) being what looked like OOM killed after taking >60% of their memory. Both tasks which have worked before.

The out-of-disk-space instance was running: micromamba create -y -c conda-forge gnuplot python numpy pymca workflows>=1.7 xraylib zocalo and it took at least 4GB of scratch disk space (the smallest of possible locations that podman was using to do container working on their system).

The other instance didn't get past resolving (an admittedly rather large requirement) but was using >9GB of ram on a 16GB machine the last time I checked before it died. "

Hind-M commented 3 months ago

Used micromamba version: 2.0.0rc0

jjerphan commented 3 months ago

I cannot reproduce the errors which you report using conda-forge/label/micromamba_dev/linux-64/micromamba-2.0.0{rc0-1,rc1-2}.

On my machine, installing those packages take around 1.5GiB of memory storage in the $CONDA_PREFIX, while using less than 1GiB of RAM.

@ndevenish: Could you provide the difference of your instances' resource usage when using micromamba<2.0.0rc0 and micromamba>=2.0.0rc0?

ndevenish commented 1 month ago

When this ticket was made it was a while since I had seen it happen to people.

Now 2.0.0 is out I am seeing this happen on CI

ndevenish commented 1 month ago

On this environment file

ndevenish commented 1 month ago

This is exactly 700 GB btw

ndevenish commented 1 month ago

RHEL8, 16GB memory machine:

curl -JLO https://raw.githubusercontent.com/dials/dials/refs/heads/main/.conda-envs/linux.txt
curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
psrecord  --plot out.png "bin/micromamba create -yp ENV/ -c conda-forge --file linux.txt"

% bin/micromamba --version
2.0.0

ndevenish commented 1 month ago

Possibly because it seems to be in a package-cache-fetching loop? https://github.com/user-attachments/assets/cf71deec-db90-4735-93b1-b8e6365f2fe7

jjerphan commented 1 month ago

The repodata.json is reparsed for each package (since conda-forge:: is specified for everyone of them), causing major resource usage.

This is a regression of micromamba 2.0.0.

jjerphan commented 1 month ago

From bisecting, e874e7ea71ceefa1f52bdfd8deb6bf5bb3129316 from https://github.com/mamba-org/mamba/pull/2986 is the culprit.

ndevenish commented 1 month ago

Ah, excellent detective work. Removing the conda-forge:: prefix sounds like it should give us a way to solve the problem before a more widespread fix. From recollection, we started doing that in order to prevent pulling in from other places, but I think the only way that we generate installations now avoids that completely, so it shouldn't be needed any more.

jjerphan commented 1 month ago

Yes, we must only parse the subdirectory once.

jjerphan commented 3 weeks ago

jjerphan:mamba:fix/parsing-subdir is a WIP branch to resolve this issue, it is currently blocked by https://github.com/jbeder/yaml-cpp/issues/1322.

jjerphan commented 3 weeks ago

Actually, the channel duplication is not the only cause: most of the runtime after its correction is also due to a costly quick sort execution in libsolv's solver_unifyrules.

Using samply:

samply record $HOME/dev/mamba/build/micromamba/micromamba create -yp /tmp/5ENV/ -c conda-forge --file /tmp/linux.txt

With the conda-forge:: prefix:

with `conda-forge::`

Without the conda-forge:: prefix:

without `conda-forge::`

I guess this might be due to comparison function for package solvable when the resolution is run.

jjerphan commented 2 weeks ago

Bisecting indicates that the regression has been first introduced by e874e7ea71ceefa1f52bdfd8deb6bf5bb3129316, the merge commit of #2986.