Open sethmlarson opened 1 year ago
Hey! I absolutely do, something like this is the next phase of the "pypi-data cinematic universe". I have have some of this raw data already captured from pypi, but it seems you have enriched it a bit.
Right now we have a few disconnected pieces that we can jam together to do cool things:
With this you can:
some-interesting-file-name.py
files, or others by a specific pattern{git_oid: stats}
{git_oid: stats}
to {(project_name, project_version): stats}
using the git_oid
and the datasets in this repo{(project_name, project_version): stats}
into anything, by joining the (project_name, project_version)
on another dataset (like yours)So with this we could parse all .py
files, count the number of classes, and plot "classes written over time, segmented by PyPI trove classifier/other pypi metadata/number of downloads/maintainer/whatever".
The problem is that this is all disconnected and a bit shit. I want this to be relatively seamless because I'm sick of doing it manually 😂.
I'm working on a CLI tool to handle step 1, 2 and 3 for users, but step 4 is pretty interesting.
Perhaps we could take the pypi-json-data
dataset, enrich it a bit and provide it in some format that can be used as part of this workflow?
That data could also be explorable via py-code.org, I've been thinking of adding some info from pypi-json-data to the site. not sure what format it should be in though.
Hello @orf, I absolutely love https://py-code.org! Thank you for creating this service.
I manually maintain my own dataset about Python packages available on PyPI (but more around dependency metadata and PyPI-specific information like maintainers). Do you have any interest in supporting these use-cases? Would happily stop maintaining my own dataset and point to py-code if this information is made available (your dataset is much more automated and has a nice frontend :sparkles:)
Let me know what you think, and thanks again!