seqeralabs / wave

On-demand containers provisioning service
https://seqera.io/wave/
GNU Affero General Public License v3.0
34 stars 4 forks source link

conda-forge::python install not working #626

Open kchaung-lilly opened 2 months ago

kchaung-lilly commented 2 months ago

From the GUI, there is no version drop down menu for conda-forge::python. Additionally, the existing version of conda-forge::python=3.13.0rc1 causes the build to fail.

example build id: 0352622f5438358c_1

image

pditommaso commented 2 months ago

Thanks for reporting. tagging @ewels for visibility

ewels commented 1 month ago

Quoting discussion with @vladsavelyev :


This is a known issue with anaconda search API. Some packages are missing there, python one of them https://api.anaconda.org/search?name=python&organisation=conda-forge

The Anaconda website search bar, however, works, https://anaconda.org/search?q=python, so we put in a workaround to additionally scrape it. Unfortunately, it returns only the most recent version for a package, which is what you're seeing here.

I reported this problem to anaconda, their response:

Thank you so much for contacting us, we are currently in the process of enhancing our API search feature to ensure it meets the highest standards of performance and accuracy.

We appreciate your patience as we work on improvements, and we look forward to providing you with a more robust experience soon.

In the meantime the results may not be accurate, please let us know if you need anything else.


I'd like to add that the Wave CLI should work fine. We may look into adding a free-text input to the web UI to allow users to request arbitrary package names without needing to invoke the Anaconda search APIs. I'm not sure that there's much more that we can do here for now I'm afraid.

stevekm commented 1 month ago

doesnt the wave cli require the usage of a Seqera Platform user account token? That is one of the main reasons I have avoided using it for custom container builds. Also because I am not sure how you are meant to search for available pacakges with it, so I dont know what versions of packages exist (or even the packages themselves)

pditommaso commented 1 month ago

Wave does work without using Seqera Platform user account token *however* to push to your own repository the Platform token is needed to authorise the push of the image

ewels commented 1 month ago

As Paolo says, a token is not needed to use the Wave CLI to get an image on Seqera Containers. But you also can't push custom container builds to Seqera Containers.. You can get temporary custom container builds from Wave though. Ok this is getting complicated. Here's a list:

No-auth requests also have a lower rate limit. Does that all make sense?

Also because I am not sure how you are meant to search for available pacakges with it, so I dont know what versions of packages exist (or even the packages themselves)

Yeah if you're using the CLI then that's up to you. You have to find them yourself first, as you would normally with vanilla Conda. You can see package lists for Bioconda, conda-forge and search basically all conda channels here.

ewels commented 1 month ago

@mahesh-panchal found what I guess is a similar story with Perl:

https://wave.seqera.io/view/builds/25c1e5978bf53020_3

channels:
- conda-forge
- bioconda
dependencies:
- bioconda::samtools=1.21
- conda-forge::perl=5.32.1.1
The following package could not be installed
└─ perl 5.32.1.1*  does not exist (perhaps a typo or a missing channel).
ewels commented 1 month ago

Hi all,

A bit of investigative work to see how conda search does it - as that does find the correct Python versions, so clearly can't be using the Anaconda API.

I just got a license for Proxyman so that I could poke at the SSL requests coming from the command in the terminal. I was expecting to see some undocumented magic for what it's querying, but it's actually exceptionally simple. Running the command simply downloads all packages.

My local conda config looks like this:

$ conda config --show channels
channels:
  - conda-forge
  - bioconda
  - defaults

Then running conda search python hits the following endpoints:

No headers, no query, no POST body, nothing. Just a simple GET request to each (ok it has an If-None-Match header so maybe it has a local cache somewhere, but I imagine that these will be breaking the cache every few minutes / hours with new packages).

These 4 requests sum to 222MB download, which is probably some of why it's so slow to run. Each is a huuuuge JSON file of presumably every package in the conda channel, which I guess is then searched locally.

Will discuss and compare to the logic that we have back-end for the Seqera Containers search to see if we can learn anything from this technique.

mahesh-panchal commented 1 month ago

See how rattler is doing it. Its behind Pixi, Mamba, py-Rattler.

ewels commented 1 month ago
pixi search python ``` ❯ pixi search python Using channels: conda-forge python h206b6c5_100_cp313 ------------------------- Name python Version 3.13.0 Build h206b6c5_100_cp313 Size 12873785 License Python-2.0 Subdir osx-arm64 File Name python-3.13.0-h206b6c5_100_cp313.conda URL https://conda.anaconda.org/conda-forge/osx-arm64/python-3.13.0-h206b6c5_100_cp313.conda MD5 b09a725400f670179c355b975e2854cc SHA256 a126d434dbe34ce188a46364966aeeb5a4c9c5a8547a3fec8aa095031e206c9a Dependencies: - __osx >=11.0 - bzip2 >=1.0.8,<2.0a0 - libexpat >=2.6.3,<3.0a0 - libffi >=3.4,<4.0a0 - libmpdec >=4.0.0,<5.0a0 - libsqlite >=3.46.1,<4.0a0 - libzlib >=1.3.1,<2.0a0 - ncurses >=6.5,<7.0a0 - openssl >=3.3.2,<4.0a0 - python_abi 3.13.* *_cp313 - readline >=8.2,<9.0a0 - tk >=8.6.13,<8.7.0a0 - tzdata - xz >=5.2.6,<6.0a0 ```

CleanShot 2024-10-12 at 01 20 51@2x

Much less network traffic - did a bunch of HEAD calls and downloaded two files, both compressed .zst files. But effectively the same repodata.json files by the look of it.

pditommaso commented 1 month ago

They use a sharded index https://prefix.dev/blog/sharded_repodata.

vladsavelyev commented 1 month ago

Interesting, wonder if we could switch to using the sharded index for conda-forge which the pixi team built at https://fast.prefix.dev/conda-forge.

ewels commented 1 month ago

The problem is that, until Anaconda supports sharded indexes for all conda channels, we'd have to implement and maintain a dual method to work with and without sharding, depending on channels. Which sounds messy :/

How difficult would it be to run pixi search CLI on the server back end @vladsavelyev ? Rather than reimplementing ourselves?

mahesh-panchal commented 4 weeks ago

Another one where the version isn't parsed correctly for rstudio: https://wave.seqera.io/view/builds/bd-123dcc2296d1a143_1

Edit: Not sure what's going on there, but rstudio is part of the r channel ( defaults ), but if you search in seqera containers it comes up.