Open kchaung-lilly opened 2 months ago
Thanks for reporting. tagging @ewels for visibility
Quoting discussion with @vladsavelyev :
This is a known issue with anaconda search API. Some packages are missing there, python one of them https://api.anaconda.org/search?name=python&organisation=conda-forge
The Anaconda website search bar, however, works, https://anaconda.org/search?q=python, so we put in a workaround to additionally scrape it. Unfortunately, it returns only the most recent version for a package, which is what you're seeing here.
I reported this problem to anaconda, their response:
Thank you so much for contacting us, we are currently in the process of enhancing our API search feature to ensure it meets the highest standards of performance and accuracy.
We appreciate your patience as we work on improvements, and we look forward to providing you with a more robust experience soon.
In the meantime the results may not be accurate, please let us know if you need anything else.
I'd like to add that the Wave CLI should work fine. We may look into adding a free-text input to the web UI to allow users to request arbitrary package names without needing to invoke the Anaconda search APIs. I'm not sure that there's much more that we can do here for now I'm afraid.
doesnt the wave cli require the usage of a Seqera Platform user account token? That is one of the main reasons I have avoided using it for custom container builds. Also because I am not sure how you are meant to search for available pacakges with it, so I dont know what versions of packages exist (or even the packages themselves)
Wave does work without using Seqera Platform user account token *however* to push to your own repository the Platform token is needed to authorise the push of the image
As Paolo says, a token is not needed to use the Wave CLI to get an image on Seqera Containers. But you also can't push custom container builds to Seqera Containers.. You can get temporary custom container builds from Wave though. Ok this is getting complicated. Here's a list:
No-auth requests also have a lower rate limit. Does that all make sense?
Also because I am not sure how you are meant to search for available pacakges with it, so I dont know what versions of packages exist (or even the packages themselves)
Yeah if you're using the CLI then that's up to you. You have to find them yourself first, as you would normally with vanilla Conda. You can see package lists for Bioconda, conda-forge and search basically all conda channels here.
@mahesh-panchal found what I guess is a similar story with Perl:
https://wave.seqera.io/view/builds/25c1e5978bf53020_3
channels:
- conda-forge
- bioconda
dependencies:
- bioconda::samtools=1.21
- conda-forge::perl=5.32.1.1
The following package could not be installed
└─ perl 5.32.1.1* does not exist (perhaps a typo or a missing channel).
Hi all,
A bit of investigative work to see how conda search
does it - as that does find the correct Python versions, so clearly can't be using the Anaconda API.
I just got a license for Proxyman so that I could poke at the SSL requests coming from the command in the terminal. I was expecting to see some undocumented magic for what it's querying, but it's actually exceptionally simple. Running the command simply downloads all packages.
My local conda config looks like this:
$ conda config --show channels
channels:
- conda-forge
- bioconda
- defaults
Then running conda search python
hits the following endpoints:
No headers, no query, no POST
body, nothing. Just a simple GET
request to each (ok it has an If-None-Match
header so maybe it has a local cache somewhere, but I imagine that these will be breaking the cache every few minutes / hours with new packages).
These 4 requests sum to 222MB download, which is probably some of why it's so slow to run. Each is a huuuuge JSON file of presumably every package in the conda channel, which I guess is then searched locally.
Will discuss and compare to the logic that we have back-end for the Seqera Containers search to see if we can learn anything from this technique.
See how rattler is doing it. Its behind Pixi, Mamba, py-Rattler.
pixi search python
Much less network traffic - did a bunch of HEAD
calls and downloaded two files, both compressed .zst
files. But effectively the same repodata.json
files by the look of it.
They use a sharded index https://prefix.dev/blog/sharded_repodata.
Interesting, wonder if we could switch to using the sharded index for conda-forge which the pixi team built at https://fast.prefix.dev/conda-forge.
The problem is that, until Anaconda supports sharded indexes for all conda channels, we'd have to implement and maintain a dual method to work with and without sharding, depending on channels. Which sounds messy :/
How difficult would it be to run pixi search
CLI on the server back end @vladsavelyev ? Rather than reimplementing ourselves?
Another one where the version isn't parsed correctly for rstudio: https://wave.seqera.io/view/builds/bd-123dcc2296d1a143_1
Edit: Not sure what's going on there, but rstudio is part of the r
channel ( defaults
), but if you search in seqera containers it comes up.
From the GUI, there is no version drop down menu for conda-forge::python. Additionally, the existing version of conda-forge::python=3.13.0rc1 causes the build to fail.
example build id: 0352622f5438358c_1