Closed pconesa closed 2 months ago
Hi everyone,
I understand the challenge of fetching the latest version metadata for multiple packages from PyPI without hitting performance issues due to the extensive JSON data.
While PyPI currently does not provide an endpoint like https://pypi.org/pypi/package1/latest/json, you can improve performance by using parallel requests and caching. Here are some strategies and examples to help: 1. Parallel Requests You can use the aiohttp library to make asynchronous HTTP requests, which allows you to fetch metadata for multiple packages in parallel. This reduces the overall wait time for responses. ` import aiohttp import asyncio
async def fetch_package_data(session, package_name): url = f"https://pypi.org/pypi/{package_name}/json" async with session.get(url) as response: data = await response.json() latest_version = data["info"]["version"] return {package_name: latest_version}
async def fetch_all_packages(package_names): async with aiohttp.ClientSession() as session: tasks = [fetch_package_data(session, pkg) for pkg in package_names] results = await asyncio.gather(*tasks) return results
def get_latest_versions(package_names): return asyncio.run(fetch_all_packages(package_names))
packages = ["package1", "package2", "package3"] latest_versions = get_latest_versions(packages) print(latest_versions) ` 2. Caching
To avoid redundant API calls, you can implement a caching mechanism. Here’s an example using the cachetools library: ` from cachetools import cached, TTLCache import requests
cache = TTLCache(maxsize=100, ttl=3600)
@cached(cache) def get_package_version(package_name): url = f"https://pypi.org/pypi/{package_name}/json" response = requests.get(url) data = response.json() return data["info"]["version"]
packages = ["package1", "package2", "package3"] latest_versions = {pkg: get_package_version(pkg) for pkg in packages} print(latest_versions) ` These approaches should help in significantly improving the performance of fetching the latest version metadata for a large number of packages. I hope this helps! If you have any questions or need further assistance, feel free to ask.
This issue tracker is for the python.org website.
The issue tracker for pypi.org is at https://github.com/pypi/warehouse/.
But this would be better asked in the Python Help category at https://discuss.python.org/c/users/7.
Please close this issue and ask there or at Stack Overflow.
Closing because it is not relevant for this repo.
Maybe it is there and not easy to find.
We have a package/plugin based application that uses pypi API to discover packages.
For this we have 1st:
our own page that lists allowed plugins, returns kind of ["package1", "package2"].
Note "package1" and "package2" are valid pypi packages.
Now, having this we would like to access the latest version of this package but we do not know it so we have to go for the long and extense: https://pypi.org/pypi/package1/json and parse all content there per available plugin several tens of them (about 50?)
Performance is bad.
Is there a way to get just the latest metadata of a package without first loading https://pypi.org/pypi/package1/json?
I'm aware of this:
https://pypi.org/simple/ but it is not json based and has no filtering option?
For us something like this would work:
https://pypi.org/pypi/package1/latest/json
Note latest would be literal, meaning that latest version available.