Open lmmx opened 3 years ago
When parsing conda search listings [equivalent to index.json]
python>=3.8,python<3.9
preferred above python
)
(sys.version_info.major, sys.version_info.minor)
Not all packages use hardlinked paths, some must resort to copying (source)
Where conda fails to create a hard link, it may fall back to either a symlink or a copy. Hardlinks may fail due to permissions error, or because the destination is on a different volume than the package cache. Hard links only work within a volume. Pay special attention to how your folders are mounted, as the fallback to copying is a big speed hit.
...but if this is only a failure case then it should(?) be possible to identify package import names from this info alone
As an alternative to tar.bz2
, some are in an uncompressed outer zip (renamed as .conda
) with 2 internal .zst
tarballs, one of which is info...
(containing metadata) and one is pkg...
(source)
zstandard
("full") and python-zstd
("simple"), while pyzstd is listed as "bz2 API"pyzstd
relies on zstd
(the conda packaged tool, not the Python package)requests.get(url, stream=True).raw.read()
(usually .content
accesses a decoded output)On second thoughts:
tarfile
library’s getmembers
method (which can be simply replaced by an implicit iterator for member in tar
)zipfile
library’s extract
method with a member name provided by the namelist
methodOn closer inspection, the pyzstd.decompress
function does not delineate files(!) and although it’s not hard to figure out where the paths.json
starts, it’d be cleaner to use it in the structured way zipfile
and tarfile
allow.
ZstdFile
from pyzstd import ZstdFile
import requests
import zipfile
import tarfile
import io
import json
url = "https://repo.anaconda.com/pkgs/main/linux-64/requests-2.22.0-py37_1.conda"
b = requests.get(url, stream=True).raw.read()
z = zipfile.ZipFile(io.BytesIO(b))
info_zst = z.namelist()[1]
zz = z.read(info_zst)
class ZstdTarFile(tarfile.TarFile):
def __init__(self, name, mode='r', *, level_or_option=None, zstd_dict=None, **kwargs):
self.zstd_file = ZstdFile(name, mode,
level_or_option=level_or_option,
zstd_dict=zstd_dict)
try:
super().__init__(fileobj=self.zstd_file, mode=mode, **kwargs)
except:
self.zstd_file.close()
raise
def close(self):
super().close()
self.zstd_file.close()
zstd_tar = ZstdTarFile(io.BytesIO(zz))
zstd_files = zstd_tar.getnames()
pj = "info/paths.json"
r = zstd_tar.extractfile("info/paths.json")
j = json.load(r)
site_pkgs = set()
for d in j["paths"]:
dp = d["_path"]
suffix = dp.partition("/site-packages/")[-1]
site_pkgs.add(suffix.split("/")[0])
for sp in site_pkgs:
print(sp)
prints:
requests
requests-2.22.0.dist-info
scikit-image
(import name: skimage
) has site-packages paths like scikit_image-0.18.1-py3.9.egg-info
and skimage
numpy
has no paths!
numpy-base
Outside Python ecosystem, usable as sudo apt install zstd; tar -I zstd -xvf archive.zst
Not always available so follow these steps:
depends
shows python
version incompatible)
.conda
if available or else choose the .bz2
Result:
SELECT COUNT(*) FROM conda_packages ;
⇣
2262
The total package count in the listings JSON is 20,094 (so 17,832 packages not covered) meaning 11% of the packages on conda are in the database
TODO: identify missing values within the packages if any?
SELECT packagename FROM conda_packages WHERE packagename LIKE "z%" LIMIT 3;
⇣
zarr
zc.lockfile
zeromq
whereas [x for x in j if x.startswith("z")]
for j
loaded from the listings JSON:
z5py
zaber-motion
zaber-serial
zappy
so z5py
is the first example of a package on conda which didn't make it into the package database.
.conda
archives for this package, only .tar.bz2
Recall: the only purpose here is to determine if any of the package dependencies contain any packaged comprising the registered imports, and therefore if any of the registered imports can be dropped due to already being covered by another package's dependencies
Need:
conda search --info --json -c anaconda -c conda-forge
groups theindex.json
contents of [multiple available versions for] all packages on these channelsabout.json
fileroot_pkgs
key stores a list of that package's dependenciespaths.json
filepaths
key stores a list of dicts which have a_path
key e.g.lib/python*/site-packages/*.egg-info/**
e.g. here it does not accurately give the package's imported namebin/toolname
(not Pythonsite-packages
at all)For conda I suspect this will suffice (no need to inspect wheel itself)
May need to 'sniff' each PyPi package (or at least one?) as described here