Addressing possible design flaws in maintaining and referencing assembly-specific context IDs

redhat-documentation / modular-docs

Modular Documentation Project provides guidelines and examples for writing technical documentation using a modular framework.

Creative Commons Attribution Share Alike 4.0 International

82 stars 68 forks source link

Addressing possible design flaws in maintaining and referencing assembly-specific context IDs #220

Open IanFrangs opened 10 months ago

IanFrangs commented 10 months ago

From what I have seen, the problem with the current system of maintaining assembly-specific context IDs and then using the context ID in the ID of each module and assembly means that these IDs are not absolute, but are relative to the assembly in which they are referenced. Therefore you cannot create cross references to modules or nested assemblies that live in a different assembly than the one that the xref resides in. In other words, it is only when the referenced module or assembly is in the assembly that it is being referenced in, that the xref will build. This explains why some xrefs build but others do not.

The other problem is that when using nested assemblies the saved PARENT CONTEXT variable is overwritten. One effect of this is that the URLs of subsequent chapters (chapters after the chapter referencing a nested assembly) in a document are inconsistent. But this could also affect other functionality too.

A possible solution to solving both of these problems is to have one context ID per document, specified in the master.adoc file, which ensures that all the IDs are both unique and absolute for each document. In this case you can create xrefs between and to every module and assembly without any problems and the URLs of the chapters are unchanging. This also means that required construction of the assemblies are simplified because there is no need to try to preserve the parent context.

I understand that the reason for creating variable IDs was to support the inclusion of identically named modules, by creating contextually unique IDs for them. For instance, to refer to the same module in multiple locations within the same document. How often does this actually happen? If this is the exception to the rule, then perhaps workarounds should be used in these instances. For instance, you could create a snippet for the module content and then create duplicate modules with unique topic IDs that reference this snippet.

The following GitLab merge request provides the following two test documents that have the same document structure with multiple levels of nested assemblies and each assembly contains at least one module: https://gitlab.cee.redhat.com/rhci-documentation/docs-Red_Hat_Enterprise_Linux_OpenStack_Platform/-/merge_requests/12164

titles/mod-docs-assembly-context-test_current-paradigm document This test document demonstrates the shortcomings of the maintaining and referencing assembly-specific context IDs. For instance, con-module-1.adoc indicates the following:

The expected ID for every module and assembly, by specifying the required or destination context ID for this module or assembly, as determined by the document structure.
The actual ID that is created when the document is published, which uses the current context ID for the topic that specified this xref.

It is only when the expected ID and actual ID matches that the xref will build. This document also records the PARENT CONTEXT variable before and after a nested assembly is added to an assembly to show how and where the saved PARENT CONTEXT variable is overwritten.

titles/mod-docs-assembly-context-test_only_one_context document This test document demonstrates how only having one context ID per published document makes linking to module or assembly easy and eliminates the problem with the PARENT CONTEXT variable leakage.

jherrman commented 10 months ago

@IanFrangs I may not have correctly understood the issue you described, but you should be able to xref to modules outside of the assembly where the xref "resides". To do this, use the context value of the parent assembly (of the module being xref'd) with the ID of the module. For instance: xref:sample-module-id_sample-assembly-context-value[xref caption]

As for your suggested solution of a document-wide context variable, as far as I can tell, it would not really solve much. It would mean that a module that's included more than once anywhere within the same docs title ("book") would still have duplicated IDs, and thus prevent the book from building. In practical terms, having the same context variable for the entire title works the same as having no context variable at all.

Please do correct me if I've missed or misunderstood something, though.

Going back to the larger question of whether to use context variables at all, I think it's worth exploring different mechanisms for single-sourcing content in the future, but I would steer clear of removing contexts from established doc sets that already use them.

To elaborate a bit:

There have already been discussions in the RHEL docs team about the usefulness of the context variable, with the following arguments:

The primary usecase of contexts is reusing the same module multiple times within the same documentation title. However, in practice, this actually happens quite rarely - at least in the RHEL docs project. Also, reusing a module (once) in a different documentation title should work even without contexts.
The current implementation of using contexts is rather complex, difficult to learn, and painful to debug.
The generated doc URLs are very long a rather "ugly".

Each of these arguments is rather weighty, so I'd say searching for a more "lightweight" solution for future doc sets is warranted. That being said, as far as I know, the direction of the "new docs experience" is to have smaller doc titles, where module reuse in the same title is even less likely - which could make contexts more or less obsolete in and of itself.

On the other hand, removing or unifying contexts in existing doc sets would mean having to change all the URLs in links and internal tooling, and would also inevitably also break links to RH docs from external sites, so I would say it is not worth the trouble.

For the time being, if someone is facing significant issues with including the same module multiple times in the same title, just linking it instead should be a reasonable workaround (even though it kinda defeats the purpose of modular documentation).

Hopefully this helps clarify a bit...

IanFrangs commented 10 months ago

@jherrman if your document only uses chapter assemblies then it is possible to predict and hardcode the destination context variable. But this becomes very difficult when nested assemblies are used. This also requires a lot of additional effort for the writer to determine. And when nested assemblies are used the parent context variable is no longer static, which messes up the chapter URLs. This can potentially become confusing when a guide has chapters for more than one component.

The advantage of the one context per published document is:

It requires minimal changes to the existing mod docs format, all you need to do is to comment out the context variables in all the assemblies and ensure that all of your xrefs use the {context} variable.
You can also keep the ifdef statement for parent context in your assemblies even though these are not needed since the parent context never changes but for ease of use these could remain unchanged.
Writers can create xrefs with confidence using the {context} variable, even before the referenced topic exists, knowing that they will be successfully built.
This fixes the problem with the parent context variable changing when using nested assemblies.

I am currently treating the issue of having multiple topics with the same ID as a corner case, which can be easily worked around. I do not see why everyone needs to suffer to support this functionality.

jherrman commented 10 months ago

@jherrman if your document only uses chapter assemblies then it is possible to predict and hardcode the destination context variable. But this becomes very difficult when nested assemblies are used. This also requires a lot of additional effort for the writer to determine. And when nested assemblies are used the parent context variable is no longer static, which messes up the chapter URLs. This can potentially become confusing when a guide has chapters for more than one component.

We may be getting our wires crossed here, but the expressed IDs for nested assemblies (and the modules in them) work pretty analogously to non-nested ones. For instance, if you have a doc title named "Book" (with context Book defined in master.adoc), which contains AssemblyA (with ID AssemblyA_{context} and context AssemblyA), which contains AssemblyB (with ID AssemblyB_{context} and context AssemblyB), which contains Module1 (with ID Module1_{context}), then you get the following IDs for xrefs:

AssemblyA: AssemblyA_Book
AssemblyB: AssemblyB_AssemblyA
Module1: Module1_AssemblyB

Admittedly, gleaning the correct IDs and context for xrefs is indeed not easy that way. In my experience, usually the fastest and most convenient way to get an ID for xref is to render the doc in a browser (for example using bccutil), click the ToC link to the section you want to xref, and then copy-paste the end bit of the URL you get (which is the ID of the section).

As a sidenote, the issue you might be facing with nested assemblies is that reusing one multiple times in a doc inevitably leads to duplicated IDs for its modules.

The advantage of the one context per published document is:

* It requires minimal changes to the existing mod docs format, all you need to do is to comment out the context variables in all the assemblies and ensure that all of your xrefs use the {context} variable.

* You can also keep the ifdef statement for parent context in your assemblies even though these are not needed since the parent context never changes but for ease of use these could remain unchanged.

* Writers can create xrefs with confidence using the {context} variable, even before the referenced topic exists, knowing that they will be successfully built.

* This fixes the problem with the parent context variable changing when using nested assemblies.

Again, as far as I can tell, all these would apply even if we just deleted {context} variables altogether. You're right that using {context} can be useful for ifdef statements (for instance to render conditional text), but at least on RHEL, we already deal with that usecase by defining the doc name as a :parameter: in the master.adoc or local attributes of the title.

IanFrangs commented 10 months ago

@jherrman thank you for your detailed responses, I agree that we are indeed getting our wires crossed :smile: I think that the one thing that we can agree on is that the current system is far from ideal. My intent is for us to devise a solution that requires minimal changes to the existing system of using CONTEXT and PARENT CONTEXT variables, which fixes these problems in such a way that every Red Hat documentation team can benefit these changes. That is why I have suggested the one CONTEXT variable per published document as a possible solution. Because then every xref does build, without the writer needing to publish the document first to find out the target, as is the general case when creating links, and the PARENT CONTEXT specified in the URL of each chapter remains constant no matter how many nested assemblies a chapter uses (although I personally would only use one level of nested assemblies in a document to ensure that the topic hierarchy does become too deep).

Your responses also highlight the need for Red Hat documentation teams to meet and share the knowledge and usage tips that they have tried and tested in making the documentation process easier. For instance, the Openstack documentation team, of which I am a member, have only just started creating global attributes for our published document titles, so that we can ensure that we always use the updated title names when creating links to them.

asteflova commented 10 months ago

the one CONTEXT variable per published document ... Your responses also highlight the need for Red Hat documentation teams to meet and share the knowledge

Hi all, I'll just drop in here with an example of how Foreman docs (upstream of RH Satellite) use one context per document, in case anyone finds it useful. If you don't, feel free to ignore :upside_down_face:

We use {context} defined per guide to be able to tweak modules between different guides if required: ifeval::[{context} == "XYZ"], ifdef::XYZ[], or ifndef::XYZ[].
We use ifdef::XYZ[] or ifndef::XYZ[] when we need to include or exclude a particular piece of content (assembly, module, paragraph, whatever) for a specific build target (RH downstream Satellite, ATIX downstream orcharhino, upstream Foreman, upstream Katello, etc.).
We append {context} to every ID, but that doesn't seem to serve any purpose actually.

Modules that are used in different contexts tend to get complex. But most modules are straightforward to write as well as to xref.

IanFrangs commented 10 months ago

Thank you @asteflova for sharing the use of {context} variables for tagging conditional content, I used this functionality extensively when I was single-sourcing guides that shared similar content when using Madcap Flare in the past and I was unaware that Asciidoc provided this functionality!

rolfedh commented 10 months ago

Reposted here from https://gitlab.cee.redhat.com/rhci-documentation/docs-Red_Hat_Enterprise_Linux_OpenStack_Platform/-/merge_requests/12164#note_8855781:

"I wanted to discuss a small clarification. It appears there might be a slight misunderstanding regarding the use of the {context} custom attribute in xrefs. According to the mod docs standard, the {context} attribute is typically used in module IDs and ifdef statements, but not in xrefs, assembly IDs, or nested assembly IDs. The examples for xrefs usually reference ID values only. I've noticed that different product documentation sets have varied implementations of the {context} custom attribute, IDs, and xrefs. These variations don't always align with the mod docs standard. As we work through the issues you've insightfully outlined, it's important for us to distinguish between challenges stemming from specific implementations and those from the standard itself. This way, we can apply solutions to the areas that need them. Warm regards, Rolfe"

emmurphy1 commented 10 months ago

Member of the Modular Documentation Steering Commitee and several folks from the RHEL team, including some who were involved in the original design of the modular templates, meet to discuss this issue. The RHEL folks provided insight into the purpose of the context variable. The context variable makes it possible to reuse a module within a single title, but in different assembly files.

The group suggested that reuse might not be very common. The RHEL team will investigate methods to quantify reuse within a single title. We will make of list of possible alternative solutions.

emmurphy1 commented 10 months ago

@asteflova here are some examples of reuse in Installing and configuring Red Hat Process Automation Manager, https://github.com/kiegroup/kie-docs/tree/main/titles-enterprise/installing-and-configuring:

Red Hat Process Automation Manager versioning (2 times) https://github.com/kiegroup/kie-docs/blob/main/doc-content/enterprise-only/installation/about-ba-con.adoc

About Red Hat Process Automation Manager (2 times) https://github.com/kiegroup/kie-docs/blob/main/doc-content/enterprise-only/installation/installing-con.adoc

Using the installer in interactive mode (2 times - using conditional statements) https://raw.githubusercontent.com/kiegroup/kie-docs/main/doc-content/enterprise-only/installation/installer-run-proc.adoc

Just a sample. There are probably more even in this title.

maximiliankolb commented 10 months ago

We append {context} to every ID, but that doesn't seem to serve any purpose actually.

This allows you to include files multiple times, e.g. "creating a compute profile" for VMware, Amazon EC2, etc. @asteflova

I would love it if there is an alternative solution because it's actually a big headache for me downstream.

emmurphy1 commented 9 months ago

In our last meeting, we talked about module reuse and whether or not it is common. We need to distinguish between two module reuse scenarios:

Using a module once in an assembly where that assembly is included in a title (master.adoc) that includes other assemblies that include the same module.
Reusing a module multiple times in a single assembly.

Scenario 1 is expected and supported where appropriate, and in my experience widely used. We want to encourage writers to reuse modules where useful and appropriate. Scenario 2 is in my experience rarely used.

If the context variable makes scenario 1 possible and there is no other solution, then I do not think we should remove the context variable from sub-assemblies.

mjahoda commented 9 months ago

In our last meeting, we talked about module reuse and whether or not it is common. We need to distinguish between two module reuse scenarios:
1. Using a module once in an assembly where that assembly is included in a title (master.adoc) that includes other assemblies that include the same module.

2. Reusing a  module multiple times in a single assembly.

I believe that there are three module reuse scenarios. The first, the most common, and the most useful one is a reuse of a module in different build trees (assembly of assemblies, in our case typically starting with a master.adoc file). This module reuse cannot cause a non-unique ID conflict. For reference, I'll call this Scenario 0.

Scenario 1 is expected and supported where appropriate, and in my experience widely used. We want to encourage writers to reuse modules where useful and appropriate. Scenario 2 is in my experience rarely used.

My experience is:

Scenario 0 - commonly used
Scenario 1 - rarely used (might be "commonly" on certain projects)
Scenario 2 - does not make sense IMHO

If the context variable makes scenario 1 possible and there is no other solution, then I do not think we should remove the context variable from sub-assemblies.

Because of the serious disadvantages with the current _{context} implementation in IDs (and the default templates), namely:

long and repetitive URLs
complicated referencing
steep learning curve for new contributors
harder-to-read code I vote for removing it from the default templates and guidelines. Instead, I would put this option to the commented-out lines (or a secondary set of templates as proposed @asteflova on the last RHEL focus group mtg) plus dedicate a new section in the mod-docs guide to this advanced topic.

maximiliankolb commented 9 months ago

I would appreciate if we clearly collect our usage/different scenarios. Some examples from the top of my head:

https://docs.theforeman.org/nightly/Managing_Content/index-katello.html#Importing_Kickstart_Repositories_content-management (affects katello, Satellite, and orcharhino): https://github.com/theforeman/foreman-documentation/blob/master/guides/common/assembly_importing-kickstart-repositories.adoc?plain=1

Overwriting context assemblies to reuse procedures (affects katello, Satellite, orcharhino):

guides/common/assembly_provisioning-virtual-machines-kubevirt.adoc
2::context: kubevirt-provisioning
12::context: {parent-context}

guides/common/assembly_provisioning-cloud-instances-ec2.adoc
2::context: ec2-provisioning
38::context: {parent-context}

IMHO an (almost ready to use) alternative solution: Overwriting "ProductName" to create similar guides for Installation/upgrading in different scenarios:

$ rg ProductName
guides/doc-Installing_Proxy/master.adoc
6::ProductName: {SmartProxyServer}

guides/doc-Quickstart/master.adoc
6::ProductName: {ProjectServer}

guides/common/attributes-satellite.adoc
101::ProductName: {ProjectName}

guides/doc-Installing_Server/master.adoc
6::ProductName: {ProjectServer}

guides/doc-Installing_Server_Disconnected/master.adoc
6::ProductName: {ProjectServer}

with different advantages and disadvantages

In general, I would be happy to discuss and eventually move away from "guide-specific" context attributes that are static within a guide.
Also, the "parent-context"-construct is sometimes misused in foreman-documentation and sometimes not even necessary.
Related to the title: assembly-specific context attributes would make my life possibly easier in contrast to guide-specific context attributes. I could create small a upstream/downstream PoC for this.

Two more PRs I'd like to share to highlight ease of review with our GH bot/GHA and modularizing content & reworking content in different PRs: