Closed torgo closed 3 years ago
...and also recommended to be used when authoring a spec. (instead of doing it ad-hoc)
"registries should not be used at runtime"
Travis to write something out of our discussion yesterday.
Registries are a thing. If you make use of a registry, please follow these guidelines...
Somewhat relevant language in Data on the Web Best Practices...
Best Practice 12: Use machine-readable standardized data formats
Make data available in a machine-readable, standardized data format that is well suited to its intended or potential use.
Why
As data becomes more ubiquitous, and datasets become larger and more complex, processing by computers becomes ever more crucial. Posting data in a format that is not machine-readable places severe limitations on the continuing usefulness of the data. Data becomes useful when it has been processed and transformed into information. Note that there is an important distinction between formats that can be read and edited by humans using a computer and formats that are machine-readable. The latter term implies that the data is readily extracted, transformed and processed by a computer. Using non-standard data formats is costly and inefficient, and the data may lose meaning as it is transformed. By contrast, standardized data formats enable interoperability as well as future uses, such as remixing or visualization, many of which cannot be anticipated when the data is first published. It is also important to note that most machine-readable standardized formats are also locale-neutral.
Intended Outcome
Machines will easily be able to read and process data published on the Web and humans will be able to use computational tools typically available in the relevant domain to work with the data.
Possible Approach to Implementation
Make data available in a machine-readable standardized data format that is easily parseable including but not limited to CSV, XML, HDF5, JSON and RDF serialization syntaxes like RDF/XML, JSON-LD, or Turtle.
No rdf and friends please.
To clarify: prefer formats that browsers support (i.e., compatible with Fetch). Those other formats are hard to use and add needless complexity.
Logically, JSON seems like the obvious choice - but it struck me that JSON would be problematic with large registries, since you have to have the entire dump for parsing, or at least parse the whole thing - not quite friendly to partial parsing.
Any XML based format is probably not a great idea for the same reason. (Setting aside the point that it is XML.)
@marcosc Would Fetch friendliness be a extremely important factor? I would imagine the registry sync should be a build time fetch, and not a live fetch. Live fetch is going to bring the registry server down to it's knees - apparently this was a problem with DTDs and whatnot in the past.
Any XML based format is probably not a great idea for the same reason. (Setting aside the point that it is XML.)
This is not true: you can use a SAX parser and stream the data. You can do this for anything, really, including JSON.
Would Fetch friendliness be a extremely important factor?
it's not really about friendliness: it's about integration with the web platform. RDF is over-engineered, developer hostile, incompatible with browsers, and really just needs to be finally marked as Obsolete or outright Rescinded by the W3C.
Recommending people use it is harmful to society (particularly when governments publish it) because it limits who can use the data tremendously (i.e., us, the citizens). It sets a high level of entry for users, and makes it hard to model data: that's fine if you are a PhD at MIT, or whatever (or you want to impress your friends by pretending to be smart)... but it's not good for anyone else that actually needs to get work done.
Live fetch
I don't know what that is.
Fixed some typos above.
w3c/process#83 closed and merged into this one:
Originally from @triblondon We have been asked by the AB (on TAG member ML) to look at establishing best practice for registries:
The AB also encourages the TAG to write a "design patterns for registries" document to aid Working Groups who may be considering a new registry to accompany their specification(s). The TAG could also provide guidance on which registries, tables, or other material currently embedded within a Recommendation should be candidates for extraction (as an Amended Recommendation) and made into registries
Changed name to reflect expanded scope of this issue.
Re data formats: let's stick to the general recommendation (which aligns with the data on the web best practices) and also include the "prefer formats that browsers support" wording as @marcoscaceres suggested.
Discussed this issue in our virtual f2f and agreed to put it on the back burner for now.
Cases to consider if someone were to pick this up again:
I think in practice I would argue that most registries are best maintained by the Living Standard that maintains the concept being extended. It's not clear to me that most of them need to be machine readable in the sense that I think OP means. (They are machine readable today as a machine could read the standard in theory. It will need context, but that's no different from any kind of other registry.)
We've discussed … and the idea came up to have a special micro format to indicate the presence of a registry (or to indicate the URL where a registry exists) in order to aid discovery of registries.
@torgo the url you gave for the minutes where things were discussed (https://github.com/w3ctag/meetings/tree/gh-pages/2020/09-europe) is a 404.
Sorry, I renamed the meeting page because it... wasn't actually in Europe. Edited above link.
Note that https://github.com/w3c/w3process/pull/335 was merged into the Process Document, establishing process for Registries, and an issue about Guidance for registry creation https://github.com/w3c/w3process/issues/329 has been created. This issue is probably the best place to send comments, so proposing to close this issue.
As discussed in London f2f in registries subject.