linkeddata / rdflib.js

Linked Data API for JavaScript
http://linkeddata.github.io/rdflib.js/doc/
Other
565 stars 143 forks source link

make rdflib use established prefixes (e.g., `xsd:` instead of `XML:`) #472

Closed TallTed closed 3 years ago

TallTed commented 3 years ago

Would be cool if we could make rdflib use established prefixes as well

Originally posted by @angelo-v in https://github.com/solid/solid-panes/pull/277#issuecomment-768831562

The pull request linked above changed --

@prefix    xsd: <http://www.w3.org/2001/XMLSchema#> .

-- in one file, to --

@prefix    XML: <http://www.w3.org/2001/XMLSchema#> .

-- to match a number of instances of the latter.

I suggested that the fix should have been in the other direction, but apparently the XML: prefix is used by rdflib, and inherited from there in many places.

My go-to reference site for prefix common use is prefix.cc, which apparently doesn't support capitalized prefixes at all (which may tell us something). It does have a lowercase xml: prefix, but that doesn't match the above expansion. xs: and xsd: both do match the above; in my experience, the latter is more commonly used.

It appears there will be multiple pull requests, in part because this issue crosses multiple repos, needed to change everything relevant from XML: to xsd:.

TallTed commented 3 years ago

A quick search of this repo brings several matches...

There's only a few more within org:linkeddata, but broadening to search all of github brings hundreds of thousands of matches ... largely because github doesn't treat any punctuation marks as significant.

pmcb55 commented 3 years ago

Yeah, I agree with you that xsd is a much better prefix than XML, and so I'd propose that rdflib.js change it's current usage to use xsd consistently instead. XML and XSD are two very different things (each having different data models even (i.e., XML has the Infoset, and XSD has the Post-schema-validation (PSV) Infoset), so referring to terms from XSD using the prefix (XML or xml) is downright wrong I think.

jeff-zucker commented 3 years ago

I would hope that all prefixes in the Solid Namespace library would be used, for example acl:trustedApp rather than the meaningless n0;trustedApp which is quite confusing to beginners.

TallTed commented 3 years ago

Apparently, rdflib dynamically creates the prefixes it uses, based on the expanded URL -- there's currently no way to predefine prefixes.

So this issue will remain open (at least for now) to track the desire to do so. I imagine that more people signing on to this wish will increase the likelihood that it will happen.

jeff-zucker commented 3 years ago

Well, since the Solid Namespace is basically a URL-to-understandable-prefix converter, I don't see rdflib's use of URLs to dynamically create prefixes as an obstacle to accomplishing a more human-readable Turtle output.

bourgeoa commented 3 years ago

The prefixes seems to be added in a local namespace in : https://github.com/linkeddata/rdflib.js/blob/5506be534cf8fc29882e02ec011f2524bf43f6c9/src/serializer.js#L94-L130 solid-namespace could be introduced before building other indexes.

n is created in L127

There may be consequences in the tests.

jeff-zucker commented 3 years ago

Rdflib uses the solid-namespace library for it's Namespace() method so it would seem like the natural. But Inrupt has introduced several different kinds of vocabulary managers including rdf-common-vocab, solid-vocab-common. @Vinnl can you tell us what those new vocab libraries do in relation to the old one used by rdflib and solid-ui? Is there a reason we should eventually move rdflib toward those?

Vinnl commented 3 years ago

@jeff-zucker I think they're mostly useful if you need to work on a vocabulary directly, e.g. need the names and descriptions of the IRIs defined by it, so I think adding it to rdflib would pull in a lot of stuff you wouldn't need.

Compared to solid-namespaces, though, another difference is that it's automatically generated from the actual vocabularies, i.e. it contains predefined constants for the terms in there, whereas solid-namespace just concatenates whatever string you give it to the namespace it knows, so it allows you to use terms not defined in the vocab, and doesn't provide autocompletion.

But pinging @pmcb55, as the vocab libraries are his thing, really.

TallTed commented 3 years ago

Appears to be resolved by #483!

pmcb55 commented 3 years ago

Hiya @jeff-zucker, I started to reply to your question about rdf-common-vocab and solid-vocab-common, but it quickly started to turn into a full-blown blog entry about the tool we use to generate those vocabulary artifacts in the first place!

We hope to open source that tool in the next couple of months, but until we do I'll just say that the generated artiacts (such as the ones you reference) are basically no different to any existing hand-crafted, or auto-generated, programming-language-specific constants libraries, i.e., existing npm (or Gradle, or Maven, or Gem, etc.) artifacts that provide constants for the vocabulary terms defined in common RDF vocabularies that already exist today (e.g., FOAF, SKOS, RDF, LDP, EARL, DOAP, Schema.org, or your own custom RDF vocabs, etc.).

As you already know, everyone who comes fresh to RDF, regardless of chosen programming language, always either creates their own shared constants for all the nasty-looking vocab term IRIs they choose to use, or imports them from already-generated vocab constant libraries, which are generally provided by the RDF library they've chosen to use in the first place (i.e., all the common RDF libraries out there today tend to provide common RDF vocab classes, such as RDF4J's vocabulary classes here, or Jena's here).

So for example, Java developers that have chosen to use RDF4J will tend to choose RDF4J's already-generated vocab constants for common RDF vocabularies (e.g., here for FOAF), or if the developer chose to use Jena, then they'll tend to use Jena's FOAF constants here, or if they are JavaScript (or TypeScript) developers, then perhaps they'll have chosen solid-namespaces, or perhaps rdf-namespaces, or perhaps https://github.com/matthieubosquet/namespace.

Ultimately it really doesn't matter, so long as the underlying vocabularies themselves aren't undergoing significant evolution, and therefore can be relied upon to be somewhat 'stable'. So I wouldn't say there's any reason or strong justification for anyone to change over from whatever is working for them today.

But just to whet your appetite for the soon-to-be-open-sourced Inrupt code generator (we don't have any planned release dates yet), it's worth mentioning that it's intent goes far beyond simply providing constants for vocab terms.

It's main intent is to support the generation of constants for any programming language - for instance at Inrupt we use JavaScript, TypeScript, Java and Python. Having a single tool to generate constants consistent across languages is very important, especially when it comes to managing your own custom vocabularies.

It also supports generating constants using the native types of existing RDF libraries, e.g., RDF4J, CommonsRDF, RDF/JS, rdflib.js, etc. A big benefit of this capability is that you're not forced to treat these vocab IRIs as strings, but instead to correctly define them as the IRI types of your underlying RDF library.

It also provides (optionally) access to vocab term metadata (such as any rdfs:label, rdfs:comment (in multiple human-languages), rdfs:seeAlso, skos:isDefindedBy, etc.) that any vocabs may choose to provide (very useful for front-end applications that may wish to define multilingual UI labels, tooltips, error messages, etc., using RDF vocabs); generation of very complete HTML documenation (i.e., using Widico); optional enforcement of RDF vocabulary creation guidelines (e.g., according to the public (but currently draft) Inrupt guidelines published here, for which it would be fantastic to get more feedback); automatic packaging and publishing of artifacts to repositories, like Verdaccio, npmjs.org, local Maven, Maven Central, Cloudsmith, etc.

Yet another capability is the ability to file-watch local vocabulary instances (i.e., to watch local Turtle or Trig files (or whatever RDF serialization you prefer)), and to automatically re-generate the associated programming-language-specific constants that get picked up within a second or two by your IDE. This is super-useful, for instance, when a developer is adding, deleting or editing the text of a tooltip/UI label/Error/Warning/Debug/Trace message in their code - i.e., they simply update their RDF vocabulary, and can directly see and test those changes automatically and immediately.

It also supports local caching of remote vocabularies, which can be useful for offline generation, or for having local copies of remote vocabularies in readable Turtle for vocabs that may only be published as RDFa or RDF/XML, which are far less human-readable.

So TL;DR: no need for anyone to change what they are doing today, and an optional code generator tool being offered from Inrupt, hopefully in the coming months.

jeff-zucker commented 3 years ago

@pmcb55 Thanks for your response and explanation of Inrupt's upcoming release. The upcoming Inrupt generator sounds like a very excellent and useful tool. However, it is orthogonal to the problem I am describing and, in my mind, does not justify the conclusion you come to. Perhaps I've misunderstood, but it seems to me that I am asking "let's make Turtle files more readable" and you are saying "but we don't need to read Turtle files". That's true, but ...

If one is operating, as most coders working for companies do, in a completely automated RDF environment, one a) doesn't really care what the Turtle looks like to a human and b) is going to have a toolset in common with the other coders in their shop. So for that intended audience Inrupt's generator is perfect and solves all the problems.

But I am coming from a different perspective - that of a community coder, and that of someone who spends a lot of time in the chat and forum answering newcomer questions. If Solid is to succeed, it needs a robust open source community. The people we want to draw into that community are almost certainly unfamiliar with RDF and possibly also with Javascript/Typescript and its toolchain. We do not want this onboarding experience to be "here - look at this human readable format that isn't human readable and figure out how to produce it using these RDF tools, but first learn jest, travis, lerna, yarn, React, and these very cool Inrupt tools."

Obviously, once someone understands RDF and the Solid tool ecosystems, human readable Turtle (or JSON-LD or whatever) becomes less and less important and one wants libraries which hide it from you or abstract away from it. But for the new to Solid developer, and tutorials meant for them, it would be so nice to just say, go look at your profile and here's what it means. Describing prefixes is hard enough without another layer of "oh and n0: means acl: and n2: means foaf:" This is especially true if one is also describing ontologies for the first time.

The new developer's first experience of a Pod is (at least for now) is the SolidOS (mashlib) databrowser. They can do that without any knowledge of other tools. The change being proposed in rdflib would mean that the first experience will be easier to absorb. And it should not have any impact on more advanced tools - people who never want to look at Turtle can just use your generator without this change having any impact on them. New users can manually poke around until they are familiar and then start using the tools that abstract away.

pmcb55 commented 3 years ago

Hi @jeff-zucker - sure thing, I totally agree with you, so I don't think there's any disconnect between us. My lengthy response was literally just in reply to the very specific questions above in comment https://github.com/linkeddata/rdflib.js/issues/472#issuecomment-800524163, i.e., I was merely trying to describe what rdf-common-vocab and solid-vocab-common are, and where they came from - that's all. I totally agree that our generator would only ever be run by more advanced developers, and I also totally agree with your attempts to improve the (completely unrelated) question of how rdflib.js handles or generates prefixes. Cheers.

jeff-zucker commented 3 years ago

Ah, well your description wasn't wasted, I am drooling in anticipation of you new tool.