protocol-registries / link-relations

Registry for Link Relation Types
https://www.iana.org/assignments/link-relations/
27 stars 14 forks source link

Offer JSON-LD as an export option #25

Closed sbp closed 1 year ago

sbp commented 3 years ago

If you wanted to offer JSON-LD as an export option, alongside the existing CSV, here is a script that converts the CSV to JSON-LD. Cf. https://github.com/mnot/I-D/issues/140

#!/usr/bin/env python3
"""
Usage: ./rels.py link-relations-1.csv URL
URL should be e.g. 'http://www.example.org/assignments/relation/#'
"""

from typing import Any, Dict, Iterator, List

def fold(lines: Iterator[str]) -> Iterator[str]:
    next(lines)
    line: str
    buffer: List[str] = []
    for line in lines:
        line = line.rstrip("\n")
        if line.startswith(" "):
            buffer.append(line.lstrip(" "))
            continue
        if buffer:
            yield " ".join(buffer)
            buffer[:] = []
        buffer.append(line)
    if buffer:
        yield " ".join(buffer)

def fields(lines: Iterator[str]) -> Iterator[List[str]]:
    from re import Match, Pattern, compile

    quoted: Pattern[str] = compile(r'"((?:[^"]|"")*)"')
    def replace(m: Match[str]) -> str:
        value: str = m.group(1)
        value = value.replace('""', '"')
        value = value.replace(",", "\ud800")
        return value
    for line in lines:
        line = quoted.sub(replace, line)
        v: str
        yield [v.replace("\ud800", ",") for v in line.split(",")]

def main() -> None:
    from json import dumps
    from sys import argv

    filename: str = argv[1]
    data: Dict[str, List[Any]] = {
        "@context": [
            {"citation": "http://purl.org/dc/terms/bibliographicCitation"},
            {"description": "http://purl.org/dc/terms/description"},
            {"label": "http://www.w3.org/2000/01/rdf-schema#label"},
            {"note":"http://www.w3.org/2004/02/skos/core#note"}
        ],
        "@graph": []
    }
    with open(filename, encoding="utf-8", errors="replace") as f:
        for row in fields(fold(f)):
            if len(row) != 4:
                continue
            datum: Dict[str, str] = {
                "@id": "http://www.iana.org/assignments/relation/#" + row[0],
                "label": row[0],
            }
            if row[1]:
                datum["description"] = row[1]
            if row[2]:
                datum["citation"] = row[2]
            if row[3]:
                datum["note"] = row[3]
            data["@graph"].append(datum)
    print(dumps(data, indent="  "))

if __name__ == "__main__":
    main()
sbp commented 3 years ago
--- a/rels.py 2021-07-22 18:43:59.000000000 +0100
+++ b/rels.py 2021-07-22 18:52:06.000000000 +0100
@@ -57,7 +57,7 @@
             if len(row) != 4:
                 continue
             datum: Dict[str, str] = {
-                "@id": "http://www.iana.org/assignments/relation/#" + row[0],
+                "@id": argv[2] + row[0],
                 "label": row[0],
             }
             if row[1]:

*cough*

sbp commented 3 years ago

Even easier when you use the standard CSV parser correctly:

#!/usr/bin/env python3
"""
Usage: ./rels.py link-relations-1.csv URL
URL should be e.g. 'http://www.example.org/assignments/relation/#'
"""

from typing import Any, Dict, Iterator, List

def main() -> None:
    from csv import reader
    from json import dumps
    from sys import argv

    filename: str = argv[1]
    data: Dict[str, List[Any]] = {
        "@context": [
            {"citation": "http://purl.org/dc/terms/bibliographicCitation"},
            {"description": "http://purl.org/dc/terms/description"},
            {"label": "http://www.w3.org/2000/01/rdf-schema#label"},
            {"note": "http://www.w3.org/2004/02/skos/core#note"},
        ],
        "@graph": [],
    }
    with open(filename, newline="", encoding="utf-8", errors="replace") as f:
        for row in reader(f):
            if len(row) != 4:
                continue
            datum: Dict[str, str] = {
                "@id": argv[2] + row[0],
                "label": row[0],
            }
            if row[1]:
                datum["description"] = row[1]
            if row[2]:
                datum["citation"] = row[2]
            if row[3]:
                datum["note"] = row[3]
            data["@graph"].append(datum)
    print(dumps(data, indent="  "))

if __name__ == "__main__":
    main()
mnot commented 3 years ago

Not sure what to do with this -- if we want the RFC Editor to produce another format, that's a somewhat involved thing on their side; doable, but given that the format isn't blessed by an RFC, it might not be easy.

sbp commented 3 years ago

Oh, is that a requirement or just something they prefer? The link relations page itself uses XHTML 1.0, which is a W3C REC (since 2000) just like JSON-LD (1.0 since 2013, 1.1 since 2020). And the media type for JSON-LD has been registered at the IANA since 2013. Also JSON itself is RFC 8259 of course. I understand the desire to keep things in house, but avoiding W3C recommendations is pretty hardcore.

But I just wanted to show that your fear expressed in 2016 that there were "some fairly exacting requirements on how that namespace is run" is actually not too bad, on the technical side of things. You just run the script with the pre-existing CSV as input, and serve the output JSON-LD file as application/ld+json. Red tape is another matter... if you can help to cut that for the community, that'd be cool.

One way of looking at it might be to compare what Wikidata are doing. They curate an astonishing amount of data, with lots of different representations. Perhaps the IANA could learn some things from what Wikidata are doing, even if they have different resources and organisational structures behind them?

Consider the debate you had with Tantek in 2014, where he said:

HTML5 deliberately abandoned the IANA / RFC5988 ceremony bureaucracy for link registrations because it proved to be unnecessarily inefficient

I don't know the particulars of that case and whether or not a more streamlined application process would be useful; I wonder if it was part of the decision process in opening up this Github issues interface? But in any case, if the IANA can't keep up with modern practices, that might create unnecessary friction in places. I don't think it matters particularly for JSON-LD, but maybe this is representative of wider issues in the ever evolving landscape?

I won't be too disappointed if the script that I wrote sits here unused.

mnot commented 3 years ago

I don't want to put words in their mouth, but AIUI IANA have a very large number of registries to maintain, so creating special processes for a given registry create extra burden (ideally, registries should all work in the same fashion). It's possible for them to do that (and indeed in my experience they're very willing to help), but if we ask them to do something special, it should be based upon a well-formed request that's had some amount of community review.

So let me chat with them and see what their take is. If they are willing to do this without an RFC defining exactly what they're to do (and they very well may be), we should have at least some level of review of what you're proposing to make sure it's correct.

sbp commented 3 years ago

Thanks. Meanwhile I'm continuing to improve the script as a gist:

https://gist.github.com/sbp/57d4785ed46ea7039a5606fd9752b10f

mnot commented 3 years ago

I've had an initial discussion with IANA; they're interested, but within the framework of offering all registry data in a JSON format. This might take some time. If folks who are familiar with JSON-LD are willing to help them, that would be good to know.

sbp commented 3 years ago

JSON-LD is quite simple, but add me to the list of people willing to help.

stain commented 2 years ago

Bumping this and volunteering - few cycles available, but very interested in getting this done.

BTW, you use the prefix that would make http://www.iana.org/assignments/relation/#terms-of-service for link relation terms-of-service - while I think http://www.iana.org/assignments/relation/terms-of-service already works with redirect and was suggested earlier.

Seeing mnot/I-D#140 and ietf-wg-httpapi/linkset#45 and mnot/I-D#39 I guess that's a can of worms we may not want to open too much again.. considering what would be the best way to mint these URI identifiers across all the many IANA registries then perhaps the local # identifiers are easier? (We could even throw in a ` in the XHTML)

mnot commented 2 years ago

The discussion with IANA is coming along, FWIW. They are definitely interested, AIUI.

dret commented 2 years ago

On 2022-03-22 18:30, Stian Soiland-Reyes wrote:

Bumping this and volunteering - few cycles available, but very interested in getting this done.

it is something that many have expressed an interest in, more generally speaking as a machine-readable version. there is one available (see below) but it's JSON and not JSON-LD.

BTW, you use the prefix that would make http://www.iana.org/assignments/relation/#terms-of-service http://www.iana.org/assignments/relation/#terms-of-service for link relation |terms-of-service| - while I think http://www.iana.org/assignments/relation/terms-of-service http://www.iana.org/assignments/relation/terms-of-service already works with redirect and was suggested earlier.

just being the broken record here: this is a red herring. neither of those are actual identifiers for link relations. the identifier as per the relevant spec is the string "terms-of-service".

Seeing mnot/I-D#140 https://github.com/mnot/I-D/issues/140 and mnot/I-D#39 https://github.com/mnot/I-D/issues/39 I guess that's a can of worms we may not want to open too much again.. considering what would be the best way to mint these URI identifiers across all the many IANA registries https://www.iana.org/protocols then perhaps the local |#| identifiers are easier? (We could even throw in a ` in the XHTML)

agreeing that is a can of worms, but in my mind for a good reason. maybe it would help to clearly and cleanly distinguish two issues here:

i know that this was discussed before, and it's a tough discussion, but i think we cannot get around the fact that linked data has a very limited value space for identifiers (URIs only) and that many IANA registries manage values that are not URIs. i don't think it would be useful and in my mind it could even be potentially harmful to ignore this and to make up new identifiers without being very clear about this decision and its potential implications.

mnot commented 1 year ago

Closing as this isn't a registration request.