This PR is a response to #43 and to a thoughtful Google doc from @RieksJ. It attempts to address several next steps proposed by @vinomaster. Specifically:
It proposes a detailed and concrete data model for each phase of the data lifecycle. The ingestible data model is similar to the previously agreed-upon issue template for adding new terms, with only a few refinements. The internal data model is new. It shows how normalized data will work, how terms and concepts relate to each other, and how they can be integrated with a comprehensive hyperlink strategy. The exported data model describes how data can be emitted from internal tools in a format that standard publication tools like MkDocs, SpecUp, or custom publication tools like Docusarus, can transform the content into polished artifacts. It also explains what extra metadata will be available after export, and where it comes from. This includes a resolution to the hovertext question, as well as to Rieks's desire for human-friendly versioning.
It proposes a comprehensive hyperlink strategy, as noted above. This strategy explains the format of hyperlinks of various types, at each stage in the data lifecycle.
It contains a rough spec for the internal tooling that will facilitate key behind-the-scenes workflows, and that will enforce business rules we care about. This is an important evolution in our thinking; up till now, almost all of our energy has been spent on the late stages of the lifecycle, where tools like MkDocs, SpecUp, and Docusaurus will transform data into publishable artifacts. I have felt like publication is the easier part of the problem, and that internal curation is more difficult and unspecified. So the spec is an attempt to plug that gap.
It updates the issue templates. It also simplifies README.md, adding direct links that will start common workflows instead of just describing those workflows in the abstract.
It describes tagging conventions, linking to the officially approved TOIP tags and explaining how and why we extend that set.
It describes how we will accomplish the v2 feature of having multiple glossaries in a single scope, and multiple scopes in the overall corpus.
It retains Rieks's concept of arbitrary concept types, although it doesn't elaborate it. (It leaves a place for it in the design.)
It is in harmony with Dan's ask that we stay more tightly aligned with MkDocs and SpecUp. However, it does this by making the distinction between glossary publication and data export more crisp; MkDocs and SpecUp are publication tools that transform markdown, whereas the internal terminology tooling is more like a database that emits markdown that publication tools can consume. This means that the CTWG has a looser relationship to publication in general, which in turn means that if someone wants to publish data with additional tools like Docusaurus, they can easily do so.
I believe that if we accepted this PR, we'd be taking a number of next steps that Dan outlined in #43, and that it would represent substantial forward progress. However, it is undoubtedly imperfect. For example, it doesn't explicitly address the internal dissonance we are having about whether our system should be centralized or decentralized in terms of where curation happens. Because this PR has so many moving parts, I suggest that we discuss interactively in a CTWG meeting.
This PR is a response to #43 and to a thoughtful Google doc from @RieksJ. It attempts to address several next steps proposed by @vinomaster. Specifically:
It proposes a detailed and concrete data model for each phase of the data lifecycle. The ingestible data model is similar to the previously agreed-upon issue template for adding new terms, with only a few refinements. The internal data model is new. It shows how normalized data will work, how terms and concepts relate to each other, and how they can be integrated with a comprehensive hyperlink strategy. The exported data model describes how data can be emitted from internal tools in a format that standard publication tools like MkDocs, SpecUp, or custom publication tools like Docusarus, can transform the content into polished artifacts. It also explains what extra metadata will be available after export, and where it comes from. This includes a resolution to the hovertext question, as well as to Rieks's desire for human-friendly versioning.
It proposes a comprehensive hyperlink strategy, as noted above. This strategy explains the format of hyperlinks of various types, at each stage in the data lifecycle.
It contains a rough spec for the internal tooling that will facilitate key behind-the-scenes workflows, and that will enforce business rules we care about. This is an important evolution in our thinking; up till now, almost all of our energy has been spent on the late stages of the lifecycle, where tools like MkDocs, SpecUp, and Docusaurus will transform data into publishable artifacts. I have felt like publication is the easier part of the problem, and that internal curation is more difficult and unspecified. So the spec is an attempt to plug that gap.
It explains some aspects of the curation process and the duties of curators.
It updates the issue templates. It also simplifies README.md, adding direct links that will start common workflows instead of just describing those workflows in the abstract.
It describes tagging conventions, linking to the officially approved TOIP tags and explaining how and why we extend that set.
It describes how status of individual terms and concepts is managed.
It describes how we will accomplish the v2 feature of having multiple glossaries in a single scope, and multiple scopes in the overall corpus.
It retains Rieks's concept of arbitrary concept types, although it doesn't elaborate it. (It leaves a place for it in the design.)
It is in harmony with Dan's ask that we stay more tightly aligned with MkDocs and SpecUp. However, it does this by making the distinction between glossary publication and data export more crisp; MkDocs and SpecUp are publication tools that transform markdown, whereas the internal terminology tooling is more like a database that emits markdown that publication tools can consume. This means that the CTWG has a looser relationship to publication in general, which in turn means that if someone wants to publish data with additional tools like Docusaurus, they can easily do so.
I believe that if we accepted this PR, we'd be taking a number of next steps that Dan outlined in #43, and that it would represent substantial forward progress. However, it is undoubtedly imperfect. For example, it doesn't explicitly address the internal dissonance we are having about whether our system should be centralized or decentralized in terms of where curation happens. Because this PR has so many moving parts, I suggest that we discuss interactively in a CTWG meeting.