pypa / interoperability-peps

Development repo for evolution of PyPA interoperability standards (released versions are published as PEPs on python.org)
Creative Commons Zero v1.0 Universal
22 stars 33 forks source link

PEP 426: Define a JSON-LD context as part of the proposal #31

Open ncoghlan opened 9 years ago

ncoghlan commented 9 years ago

I finally found time to investigate JSON-LD as Wes Turner has regularly suggested. It does look like a good fit for what I want to achieve with the metadata 2.0 spec: http://www.w3.org/TR/json-ld/#basic-concepts

Also useful to me was this blog post from the JSON-LD lead editor: http://manu.sporny.org/2014/json-ld-origins-2/

I've long ignored the semantic web people because they tend to design and create overengineered solutions that are completely impractical for real world use. Sporny's post persuaded me that JSON-LD wasn't like that, and hence worth investigating further.

westurner commented 9 years ago

So, this is somewhat of a frequent documentation need, and an opportunity for linked requirements traceability (#LinkedData (EDIT: #LinkedReproducibility #PEP426JSONLD)):

| Homepage: ...
| Src: git https://bitbucket.org/./.
| Download: .../download/
| Issues: bitbucket.org/././issues
| Docs: `<https://containsparens_(disambiguation)>`__
[... add'l ad-hoc attributes]

Before writing this as (most minimal, ordered) inline blocks, I wrote 'bobcat' (which requires FuXi for OWL schema reasoning) and one day drafted some thoughts for a 'sphinxcontrib-rdf' extension to add roles and directives.

More practically, how do I simulate pip install without running any setup.py files (traverse and solve from the given Requirements rules)?

And then positive externalities of exposing JSON[-LD] that is schema.org compatible:

An broader discussion for/with really tools in any language for/with RDFJS: https://text.allmende.io/p/rdfjs (see ### Classes)

ncoghlan commented 9 years ago

Also of potential interest would be linking this in to the ISO/IEC Software Identification effort: http://tagvault.org/about/

westurner commented 9 years ago

Also of potential interest would be linking this in to the ISO/IEC Software Identification effort: http://tagvault.org/about/

Do they have URNs that could be the object of a (pypi:projectname, ex:, urn:x-tagvault:xyz) triple?

westurner commented 9 years ago

The install_requires and extras_require edges need to be in the JSON[-LD]

https://github.com/ipython/ipython/blob/master/setup.py#L182

  • Note that here these variables are conditional based upon e.g. platformstr parameters.
    • Is it possible to serialize these edges to JSON at next build / release time?

The total graph of install_requires and extras_require is the sum of each of the built eggs' JSON[-LD] representations of runtime setup.py state.

westurner commented 9 years ago
westurner commented 9 years ago

[EDIT] ~fulltext cc here, emphasis added, markdown [EDIT] warehouse pkg detail template is now at https://github.com/pypa/warehouse/blob/master/warehouse/templates/packaging/detail.html

westurner commented 9 years ago

Also of potential interest would be linking this in to the ISO/IEC Software Identification effort: http://tagvault.org/about/

Do they have URNs that could be the object of a (pypi:projectname, ex:, urn:x-tagvault:xyz) triple?

Here is the XSD schema for "[ISO/IEC 19770-2:2009 Software Identification Tag Standard]" from http://tagvault.org/standards/swid_tagstandard/:

AFAIU, there is not yet support for ISO/IEC 19770-2:2009 "Software Identification (SWID) Tag Standard" tags in schema.org (e.g. schema.org/SoftwareApplication).

westurner commented 9 years ago
  • B. create an extension vocabulary (in RDFa), generate the TTL and JSON-LD context, and host those:

Possible prefix URIs (these don't have to resolve as deferencable URLs (they are URIs)):, but it's helpful if there is an HTML(+RDFa) representation there, for reference, which links to the source vocabs)

Docs on creating schema.org extension vocabulary for [Python] packages:

[EDIT] Links [EDIT] schema.org 2.1 -> 2.2 links

westurner commented 9 years ago

PEP426JSONLD

sigmavirus24 commented 9 years ago

@westurner #WhatDoHashTagsMean?

westurner commented 9 years ago

Compare:

westurner commented 8 years ago

Should there be / would it be useful to have:

[
{'distro':'...'},
{'distro': 'Ubuntu',
 'pkgname': 'python-pip',
 'url': 'http://packages.ubuntu.com/trusty/python-pip',
 # ... may also be present in e.g. downstream DOAP RDF records
 'maintainers': [{
   'name': 'Ubuntu MOTU Developers',
   'url': 'http://lists.ubuntu.com/archives/ubuntu-motu/',
   'emailAddress': 'ubuntu-motu@lists.ubuntu.com',
  }]
},]
westurner commented 8 years ago

... So, in Linked Data terminology, the package URN URI (urn:x-pythonpkg:pip) is resolved to a dereferencable URL at install time, given the distutils/setuptools/pip (~index_servers and find-links * configuration)

ncoghlan commented 8 years ago

For the distro metadata question, that's the main reason the draft metadata 2.0 proposal moves project details out to a metadata extension: https://www.python.org/dev/peps/pep-0459/#the-python-project-extension

Having the project metadata in an extension means it is then trivial to re-use the same format for redistributor metadata: https://www.python.org/dev/peps/pep-0459/#the-python-integrator-extension

For the pkgname to URI question: what practical problem will that solve for Python developers? What will they be able to do if metadata 2.0 defines that mapping that they won't be able to do if we don't define it?

westurner commented 8 years ago

For the distro metadata question, that's the main reason the draft metadata 2.0 proposal moves project container details out to a metadata extension: https://www.python.org/dev/peps/pep-0459/#the-python-project-extension

Got it, thanks hadn't been aware of this draft spec.

Having the project metadata in an extension means it is then trivial to re-use the same format for redistributor metadata: https://www.python.org/dev/peps/pep-0459/#the-python-integrator-extension

For the pkgname to URI question: what practical problem will that solve for Python developers? What will they be able to do if metadata 2.0 defines that mapping that they won't be able to do if we don't define it?

Linked Data names things with namespaced URIs for many of the same reasons that Python uses namespaces.

pip
pip==7.1.2
https://pypi.python.org/packages/source/p/pip/pip-7.1.2.tar.gz#md5=3823d2343d9f3aaab21cf9c917710196
https://pypi.python.org/packages/py2.py3/p/pip/pip-7.1.2-py2.py3-none-any.whl#md5=5ff9fec0be479e4e36df467556deed4d

-e git+https://github.com/pypa/pip#egg=pip
-e git+ssh://git@github.com/pypa/pip#egg=pip
-e git+ssh://git@github.com/pypa/pip@7.1.2#egg=pip

Practical utility of this:

westurner commented 8 years ago

If, in the future, I want to store checksums for each and every file in a package (so that they can be later reviewed), what do I key that auxiliary document to? Should I be able to just ingest 1+ JSON-LD documents into an [in-memory, ..., RDF] graph datastore?

This is a graph of packages which happened to have fit a given set of constraints on a given date and time, with a given index_servers, pip configuration... At present, pip.log and pip freeze are not sufficient to recreate / reproduce / CRC a given environment.

What I would like is:

IIUC, currently, the suggested solution is "just rebuild [in a venv [in a Docker container named 'distro']] and re-run the comprehensive test suite".

ncoghlan commented 8 years ago

The currently suggested solution for cryptographic assurance of repeated installations is to use peep to capture the hash of the Python components in the requirements.txt file: https://pypi.python.org/pypi/peep

If you want full traceability, then Nix is a better fit than any other current packaging system: http://nixos.org/nix/about.html

Offering these kinds of capabilities by default isn't a current design goal for the upstream Python ecosystem, since they can already be added by the folks that need them, and providing them by default doesn't help lower barriers to entry for new users.

dstufft commented 8 years ago

FWIW pip 8.0 will include peep’s functionality built into pip (though it is opt in by adding hashes to your requirements file).

westurner commented 8 years ago

FWIW pip 8.0 will include peep’s functionality built into pip (though it is opt in by adding hashes to your requirements file).

Is/should this also be defined in "PEP 0508 -- Dependency specification for Python Software Packages" https://www.python.org/dev/peps/pep-0508/ ? ... :+1:

westurner commented 8 years ago

Pip docs of interest (in specifying Python package dependencies):

westurner commented 8 years ago

Do I already have the metadata for this [installed] package in my journaled, append-only, JSON-LD log of (system/VIRTUAL_ENV) pip operations?

{ 
"@graph": {
    "actions": [
     {"@type": "InstallAction",
      "command": "pip install -U pip",
      "description": "log message",
      "packages": [
           {"name": "pip", "version": "7.1.2",  "versionwas": "7.1.0",
        "versionspec_constraint": ">=7.0.0",
        # ... pypi/pip/json metadata ... 
        }
      ]}
    ]}
}

Then indexing on actions[*]["packages"][*][("name", "version" [, PEP0508]] would get the current snapshot off the top of the journaled history of the env (according to [pip, ])

westurner commented 8 years ago

A JSON-LD journal of package Actions [and inlined-metadata.json] would be an improvement over (PEP376 .dist-info directories) and (pip-log.txt, pip.log) because:

https://github.com/pypa/interoperability-peps/blob/master/pep-0376-installation-db.rst https://www.python.org/dev/peps/pep-0376/

pip log

westurner commented 8 years ago

A JSONLD context for the current JSON would need an "index map" to skip over the version keys;

but in JSONLD 2.0, we would need the ability to not skip but apply the key to each nested record.

... https://github.com/json-ld/tests

westurner commented 8 years ago
westurner commented 8 years ago

This discusdion indicates that there may be need to add reified edges for packages which, according to maintainers and/or index maintainers, supersede existing packages (e.g. PIL -> pillow)