Many keys from pyproject.toml are not parsed

broeder-j commented 1 year ago

@proycon Thanks for you work!

given pyproject.toml

[project]
name = "project"
description = "Description."
dynamic = ['version']
authors = [{name = "author1", email = "author1@e-mail.de"}, 
{name = "author2", email = "author2@e-mail.de"},
{name = "author3", email = "author3@e-mail}]
readme = "README.md"
license = {file = "LICENSE.txt"}
classifiers = [
        "Development Status :: 4 - Beta",
        "Intended Audience :: Information Technology",
        "Intended Audience :: Science/Research",
        "License :: OSI Approved :: MIT License",
        "Natural Language :: English",
        "Operating System :: POSIX :: Linux",
        "Operating System :: MacOS :: MacOS X",
        "Programming Language :: Python",
        "Programming Language :: Python :: 3.8",
        "Programming Language :: Python :: 3.9",
        "Programming Language :: Python :: 3.10",
        "Programming Language :: Python :: 3.11",
        "Programming Language :: Python :: 3.12",
        "Topic :: Database :: Front-Ends",
        "Topic :: Education",
        "Topic :: Scientific/Engineering",
        "Topic :: Scientific/Engineering :: Information Analysis",
        "Topic :: Scientific/Engineering :: Visualization",

]
keywords = ["dashboard", "data visualization", "survey data", "categorical data", 
"interactive visualization", "bokeh", "panel"]
....
#requirements in poetry

This results in:

{
    "@context": [
        "https://raw.githubusercontent.com/codemeta/codemeta/2.0/codemeta.jsonld",
        "https://w3id.org/software-iodata",
        "https://raw.githubusercontent.com/schemaorg/schemaorg/main/data/releases/13.0/schemaorgcontext.jsonld",
        "https://w3id.org/software-types"
    ],
    "@id": "/project,
    "@type": "SoftwareSourceCode",
    "author": {
        "@id": "/person/author1",
        "@type": "Person",
        "email": "author1@e-mail.de",
        "familyName": "",
        "givenName": "author1"
    },
    "description": "Description",
    "identifier": "project,
    "license": "http://spdx.org/licenses/MIT",
    "name": "project",
    "runtimePlatform": [
        "Python 3",
        "Python 3.10",
        "Python 3.8",
        "Python 3.9"
    ],
    "softwareRequirements": [
        {
            "@id": "/dependency/bokeh-ge-2.4.3,-lt-3.0.0",
            "@type": "SoftwareApplication",
            "identifier": "bokeh",
            "name": "bokeh",
            "runtimePlatform": "Python 3",
            "version": ">=2.4.3,<3.0.0"
        },
        {
            "@id": "/dependency/pandas-ge-1.4.1,-lt-2.0.0",
            "@type": "SoftwareApplication",
            "identifier": "pandas",
            "name": "pandas",
            "runtimePlatform": "Python 3",
            "version": ">=1.4.1,<2.0.0"
        },
        {
            "@id": "/dependency/panel-ge-0.13.1,-lt-0.14.0",
            "@type": "SoftwareApplication",
            "identifier": "panel",
            "name": "panel",
            "runtimePlatform": "Python 3",
            "version": ">=0.13.1,<0.14.0"
        },
        {
            "@id": "/dependency/wordcloud-ge-1.8.2.2,-lt-2.0.0.0",
            "@type": "SoftwareApplication",
            "identifier": "wordcloud",
            "name": "wordcloud",
            "runtimePlatform": "Python 3",
            "version": ">=1.8.2.2,<2.0.0.0"
        }
    ],
    "version": "1.0.0"
}

So it missed, the following keys:

most of the classifiers
the other authors and the names
the readme
the keywords (maybe it depends on the importlib_metadataversion?)

I was not expecting it to get the requirements, but it nicely did. But it missed all optional requirements.

Do you plan to integrate features of codemetar like parsing the README or CITATION.cff files or so? The best data for authors will be in a CITAION.cff.

Is there a merge strategy. I.e one could then generate the codemeta.json from several sources, i.e local repo code, github API, different builds etc, and merge them in case on a certain root more metadata is gathered? From the docs, this seems to work if providing these sources as once, but can it also merge with an existing file (for the usecase that this is manual adopted since something will not work automatically and everything else I want to update automatically on pre-commit)?

proycon commented 1 year ago

Thanks for the feedback! Let me first answer your last question before I go onto the actual issue in a later comment:

Do you plan to integrate features of codemetar like parsing the README or CITATION.cff files or so? The best data for authors will be in a CITAION.cff.

Yes, in fact, I have already implemented all that, it's in https://github.com/proycon/codemeta-harvester , this is basically a shell script that invokes codemetapy to do the bulk of the work. CITATION.cff is supported through cffconvert. This is done using a merge strategy precisely as you suggest below:

Is there a merge strategy. I.e one could then generate the codemeta.json from several sources, i.e local repo code, github API, > different builds etc, and merge them in case on a certain root more metadata is gathered? From the docs, this seems to work if > providing these sources as once, but can it also merge with an existing file (for the usecase that this is manual adopted since something will not work automatically and everything else I want to update automatically on pre-commit)?

Yes, codemetapy support multiple sources and can merge things together. codemeta-harvester is a more high-level tool that detects various possible sources for a single project and invokes codemetapy in the right way to consolidate all this metadata into one. On top of that it can also harvest multiple projects and relate SoftwareSourceCode to particular instances of that software (including software as a service), which is an extension on top of codemeta. This is for example what powers https://tools.dev.clariah.nl

proycon commented 1 year ago

Do you have a more complete pyproject.toml I could try to reproduce this with? (pointing to one in a real project is even better). There's also a toml syntax error in the one you included here.

For pyproject.toml parsing I tried not to reinvent the wheel in codemetapy and am relying simply on https://github.com/pypa/pep517/ to do the actual parsing and deliver data like from importlib.metadata.

broeder-j commented 1 year ago

I corrected the pyproject.toml above to be valid (validated with https://pypi.org/project/validate-pyproject/), one classifier was wrong and the version was missing (the rest is tool specific). This particular project is not public yet, but will be soon, I hope).

Currently I am playing around and understanding codemeta.

broeder-j commented 1 year ago

(including software as a service), which is an extension on top of codemeta. This is for example what powers https://tools.dev.clariah.nl

Nice! this is one part where we want to get to. Do you know if there is some work going on in the codemeta community to get something like default github/gitlab actions/bots to produce a nice codemeta.json? Because that might be something worth to explorer for us if we want to get a codemeta.json into every repo within a large organization.

proycon commented 1 year ago

Do you know if there is some work going on in the codemeta community to get something like default github/gitlab actions/bots to produce a nice codemeta.json? Because that might be something worth to explorer for us if we want to get codemeta.json into every repo within a large organization.

I had been thinking about that as a feasible option as well, although I haven't really planned anything specifically yet. But with codemeta-harvester it should be relatively easy, the only thing that needs to be written is the github/gitlab integration (which I have to admit I don't have experience with yet). I'm not aware of anybody else already doing this.

ketozhang commented 1 year ago

@proycon The entire project metadata specification for pyproject.toml can be found https://packaging.python.org/en/latest/specifications/declaring-project-metadata/

It's a short read and you can make a more complete example. Be careful about other pyproject.toml file that extends the fields (e.g., poetry's [tool.poetry.dependencies] is not standard).

proycon / codemetapy

Many keys from pyproject.toml are not parsed #28