Best Practices for Registries (including that they should be machine readable)

torgo commented 7 years ago

As discussed in London f2f in registries subject.

cynthia commented 7 years ago

...and also recommended to be used when authoring a spec. (instead of doing it ad-hoc)

torgo commented 7 years ago

"registries should not be used at runtime"

torgo commented 7 years ago

Travis to write something out of our discussion yesterday.

travisleithead commented 7 years ago

Registries are a thing. If you make use of a registry, please follow these guidelines...

hadleybeeman commented 7 years ago

Somewhat relevant language in Data on the Web Best Practices...

Best Practice 12: Use machine-readable standardized data formats

Make data available in a machine-readable, standardized data format that is well suited to its intended or potential use.

Why

As data becomes more ubiquitous, and datasets become larger and more complex, processing by computers becomes ever more crucial. Posting data in a format that is not machine-readable places severe limitations on the continuing usefulness of the data. Data becomes useful when it has been processed and transformed into information. Note that there is an important distinction between formats that can be read and edited by humans using a computer and formats that are machine-readable. The latter term implies that the data is readily extracted, transformed and processed by a computer. Using non-standard data formats is costly and inefficient, and the data may lose meaning as it is transformed. By contrast, standardized data formats enable interoperability as well as future uses, such as remixing or visualization, many of which cannot be anticipated when the data is first published. It is also important to note that most machine-readable standardized formats are also locale-neutral.

Intended Outcome

Machines will easily be able to read and process data published on the Web and humans will be able to use computational tools typically available in the relevant domain to work with the data.

Possible Approach to Implementation

Make data available in a machine-readable standardized data format that is easily parseable including but not limited to CSV, XML, HDF5, JSON and RDF serialization syntaxes like RDF/XML, JSON-LD, or Turtle.

marcoscaceres commented 7 years ago

No rdf and friends please.

marcoscaceres commented 7 years ago

To clarify: prefer formats that browsers support (i.e., compatible with Fetch). Those other formats are hard to use and add needless complexity.

cynthia commented 7 years ago

Logically, JSON seems like the obvious choice - but it struck me that JSON would be problematic with large registries, since you have to have the entire dump for parsing, or at least parse the whole thing - not quite friendly to partial parsing.

Any XML based format is probably not a great idea for the same reason. (Setting aside the point that it is XML.)

cynthia commented 7 years ago

@marcosc Would Fetch friendliness be a extremely important factor? I would imagine the registry sync should be a build time fetch, and not a live fetch. Live fetch is going to bring the registry server down to it's knees - apparently this was a problem with DTDs and whatnot in the past.

marcoscaceres commented 7 years ago

Any XML based format is probably not a great idea for the same reason. (Setting aside the point that it is XML.)

This is not true: you can use a SAX parser and stream the data. You can do this for anything, really, including JSON.

Would Fetch friendliness be a extremely important factor?

it's not really about friendliness: it's about integration with the web platform. RDF is over-engineered, developer hostile, incompatible with browsers, and really just needs to be finally marked as Obsolete or outright Rescinded by the W3C.

Recommending people use it is harmful to society (particularly when governments publish it) because it limits who can use the data tremendously (i.e., us, the citizens). It sets a high level of entry for users, and makes it hard to model data: that's fine if you are a PhD at MIT, or whatever (or you want to impress your friends by pretending to be smart)... but it's not good for anyone else that actually needs to get work done.

Live fetch

I don't know what that is.

marcoscaceres commented 7 years ago

Fixed some typos above.

torgo commented 6 years ago

w3c/process#83 closed and merged into this one:

Originally from @triblondon We have been asked by the AB (on TAG member ML) to look at establishing best practice for registries:

The AB also encourages the TAG to write a "design patterns for registries" document to aid Working Groups who may be considering a new registry to accompany their specification(s). The TAG could also provide guidance on which registries, tables, or other material currently embedded within a Recommendation should be candidates for extraction (as an Amended Recommendation) and made into registries

torgo commented 6 years ago

Changed name to reflect expanded scope of this issue.

torgo commented 6 years ago

Re data formats: let's stick to the general recommendation (which aligns with the data on the web best practices) and also include the "prefer formats that browsers support" wording as @marcoscaceres suggested.

torgo commented 4 years ago

Discussed this issue in our virtual f2f and agreed to put it on the back burner for now.

annevk commented 4 years ago

Cases to consider if someone were to pick this up again:

Encoding Standard's data tables, e.g., https://encoding.spec.whatwg.org/index-windows-1252.txt (there's also a non-normative JSON format, perhaps that ought to be normative instead)
Storage Standard's registered storage endpoints: https://storage.spec.whatwg.org/#registered-storage-endpoints
Fetch Standard's request's destination: https://fetch.spec.whatwg.org/#concept-request-destination

I think in practice I would argue that most registries are best maintained by the Living Standard that maintains the concept being extended. It's not clear to me that most of them need to be machine readable in the sense that I think OP means. (They are machine readable today as a machine could read the standard in theory. It will need context, but that's no different from any kind of other registry.)

torgo commented 4 years ago

We've discussed … and the idea came up to have a special micro format to indicate the presence of a registry (or to indicate the URL where a registry exists) in order to aid discovery of registries.

frivoal commented 4 years ago

@torgo the url you gave for the minutes where things were discussed (https://github.com/w3ctag/meetings/tree/gh-pages/2020/09-europe) is a 404.

dbaron commented 4 years ago

Sorry, I renamed the meeting page because it... wasn't actually in Europe. Edited above link.

ylafon commented 3 years ago

Note that https://github.com/w3c/w3process/pull/335 was merged into the Process Document, establishing process for Registries, and an issue about Guidance for registry creation https://github.com/w3c/w3process/issues/329 has been created. This issue is probably the best place to send comments, so proposing to close this issue.

w3ctag / design-principles

Best Practices for Registries (including that they should be machine readable) #68

Best Practice 12: Use machine-readable standardized data formats

Why

Intended Outcome

Possible Approach to Implementation