jacobtomlinson / gha-anaconda-package-version

Get the latest version of an Anaconda package
MIT License
3 stars 4 forks source link

Mismatch between api.anaconda.org and conda-forge CDN #1

Open jrbourbeau opened 4 years ago

jrbourbeau commented 4 years ago

It's my understanding there's a lag between when https://api.anaconda.org/package/conda-forge/<package>/files and the conda-forge CDN are updated. Where the api.anaconda.org file currently being used here will show a package version existing prior to it being available for download on the CDN (which is updated by a conda-forge bot every 20 minutes, I believe). This can lead to PackagesNotFoundErrors like over in https://travis-ci.com/dask/dask-docker/builds/139837123#L189-L191.

Perhaps someone like @jakirkham can clarify if my understanding here is correct

jakirkham commented 4 years ago

Yeah that's right. It's also somewhat a function of where the nearest CDN server is to you and how long it takes for the network to propagate out to it.

That said, 20mins is a good approximate sync time. Particularly for other things running in the cloud.

As @jrbourbeau is suggesting we could bypass the CDN if low latency is important (though there is a greater risk of download errors). Alternatively we could include some kind of lag in the build process here.

jacobtomlinson commented 4 years ago

Is this a conda-forge specific error or is this likely to happen with non-conda-forge packages too?

If this is conda-forge specific I'll write a new Action and handle it differently.

jrbourbeau commented 4 years ago

Is this a conda-forge specific error or is this likely to happen with non-conda-forge packages too?

Hrm, I'm not sure if this issue applies more broadly beyond conda-forge. I was informed offline that @soapy1 may be the appropriate person to answer that question

soapy1 commented 4 years ago

The CDN is applied to the current largest anaconda channels. Notably, we have the default channels, r, conda-forge, bioconda, pytorch on it. Note, that not all channels clone at the same cadence due to how the channel usually operates. For example, conda-forge is cloned every 20 min, while r is cloned daily. Maybe we can be more transparent as to what channels are cloned and at what cadence?

jakirkham commented 4 years ago

Yeah it would be useful to have this info somewhere. Though I don't know offhand a good place to collect it. Anaconda.org would be one, but maybe it is not straightforward to show this info there. Can you think of other places where this would be useful to show?

jacobtomlinson commented 4 years ago

Thanks for all the info folks!

Can I just check I understand things right. When packages are published on channels like conda-forge, r, bioconda, etc they show in the list of available packages in the API, but they are not actually available for install until some time later (20 mins for conda-forge, 24 hours for r?).

Is there a safer way for me to detect which packages are actually available at a given time?