Splitting the API ref into several Markdown pages

feloy commented 3 years ago

As part of my Google Season of Docs project (https://developers.google.com/season-of-docs/docs/participants/project-cncf-feloy), I would like to work on the gen-apidocs tool, so it can output several Markdown files, in a format supported by Hugo/Docsy for k/website.

I propose to add a second implementation of the DocWriter interface, implementing the Markdown for Docsy output.

tengqm commented 3 years ago

Looking forward to your code.

tengqm commented 3 years ago

A long time ago, we had something similar in generating mardkown files then convert the markdown files into HTML using downstream renders/engines. That markdown writer was abandoned because we don't think another conversion is necessary.

feloy commented 3 years ago

A long time ago, we had something similar in generating mardkown files then convert the markdown files into HTML using downstream renders/engines. That markdown writer was abandoned because we don't think another conversion is necessary.

I can see an advantage of creating Markdown at this step, so that this Markdown can be processed by Hugo/Docsy at the same time as the k/website, so the final HTML is homogeneous with the whole k/website.

kbhawkey commented 3 years ago

Do you plan to continue to filter the Open API spec into spec snippets and use these files to generate the individual pages (as an intermediate publishing step)? I'm not sure it makes sense to create Markdown instead of HTML. What do you think about adding a Hugo submodule to this repo (#155)?

feloy commented 3 years ago

I don't think it will be necessary to pre-process the openAPI spec, it seems possible to generate individual pages directly from the gen-apidocs program.

I'm not very aware of the possibilities of Hugo: would you prefer to still generate HTML with gen-apidocs, containing only the spec content (no header, no footer, no sidebar, etc), then integrate these HTML snippets into pages (with some shortcode or something else?)

My idea was to create Markdown pages, one per resource, including metadatas, so these Markdown pages can be included in k/website (with the help of a submodule), and the pages would be finally rendered by Hugo to create HTML.

kbhawkey commented 3 years ago

I don't think it will be necessary to pre-process the openAPI spec, it seems possible to generate individual pages directly from the gen-apidocs program.

I'm not very aware of the possibilities of Hugo: would you prefer to still generate HTML with gen-apidocs, containing only the spec content (no header, no footer, no sidebar, etc), then integrate these HTML snippets into pages (with some shortcode or something else?)

My idea was to create Markdown pages, one per resource, including metadatas, so these Markdown pages can be included in k/website (with the help of a submodule), and the pages would be finally rendered by Hugo to create HTML.

-- If you write out the front matter (and declare the layout type) and place this page in the docs file system, Hugo should build as expected. Or could write a page with the front matter and include an HTML snippet.

-- Could write a md page from a template with the addition of custom shortcodes (and js).

-- What about using a [modified] swagger-ui stylesheet and try to replicate the swagger-ui page components + presentation? Each page is somewhat contained and displays the nested objects within the root object (linked up) versus displaying referenced objects after the root object.

-- You might want to look at the glossary layout, shortcodes, and corresponding js and css.

tengqm commented 3 years ago

-- What about using a [modified] swagger-ui stylesheet and try to replicate the swagger-ui page components + presentation?

My understanding is that @feloy prefers the same look and feel of the rest of the website. Given that we have about 90 resources in 1.19, that means we will need about 90 markdowns, divided into different groups.

Note When I say 90, I am treating Deployment for creation and Deployment for update two different things because these two have a little bit difference among their schemas. When creating a Deployment, metadata.name is required field, but it is optional for an update operation, for example. If you don't want to do this level of differentiation, you still get about 60 resources.

My points are as follows:

If we decide to go down this direction, we will still need some kind of navigation scheme (i.e. subdirectories) and asset (i.e. the _index.md files). If we can improve the navigation bar on the left for our current page, this can be easily solved.
You may have noticed that there are about 800 definitions in 1.19. If we want to generate markdowns only for resources, we will need to make each of those pages self-contained. For example, you will need to replicate "meta.v1.ObjectMeta" to many of these pages.
Converging to the same look & feel is nice to have. I'm a little bit confused about the actual gain we will get.
Consolidating all definitions referenced by one resource into a single self-contained schema is attractive. If crafted carefully, we will save user's time on traversing the definition hierarchy.

sftim commented 3 years ago

How about producing headless bundles for each API element (versioned resource)?

How this could work:

The reference docs generator fetches the API definitions as OpenAPI
The reference docs generator writes a headless bundle for each resource (example follows). The path to each (headless) leaf bundle is based on the URI path in the API.
A custom Hugo layout references all the API bundles for an API group and renders it to HTML.
Additionally, concept pages can include the same headless bundle using a new shortcode. There are several rendering options; for example
- a page per API group
- use data from the extras file (see suggestion below) to identify how to group these data
Things that are similar to HTTP-accessible APIs can also have their own headless bundle and share common rendering code. See https://github.com/kubernetes/website/issues/23889

Here's an example of how the headless bundle could look for Lease, as a file named (eg) /content/en/docs/apis/coordination.k8s.io/v1/lease.md

---
content_type: api-reference
title: Lease
sitemap_exclude: true
headless: true
api:
  - namespaced: true # used to adjust the API path automatically
    fields:
    - name: apiVersion
      type: string
      useDefaultText: true
    - name: kind
      type: string
      useDefaultText: true
    - name: spec 
      # The "text" field contains generated Markdown.
      text: >
        Specification of the Lease. More info:
        https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status
      fields:
        - name: acquireTime
          type: MicroTime
          typeApiGroup: meta/v1
          text: >
            `acquireTime` is a time when the current lease was acquired.
        - name: renewTime
          type: MicroTime
          typeApiGroup: meta/v1
          text: >
            `renewTime` is a time when the current holder of a lease has last updated the lease.
        - name: leaseDurationSeconds
          type: integer
          text: >
            `leaseDurationSeconds` is a duration that candidates for a lease need to wait to
            force acquire  it. This is measure against time of last observed `RenewTime`.
        - name: leaseTransitions
          type: integer
          text: >
            `leaseTransitions` is the number of transitions of a lease between holders.
        - name: holderIdentity
          type: string
          text: >
            `holderIdentity` contains the identity of the holder of a current lease.
        …
    operations:
       - name: Read
         operationRepresentsWrite: false
         HTTPVerb: GET
         # no backticks for pathTemplateMarkDown
         # the (imaginary) API reference layout automatically wraps this in a <tt> element
         pathTemplateMarkDown: "/apis/coordination.k8s.io/v1/namespaces/_{namespace}_/leases/_{name}_"
         queryParameters:
           - name: export
             deprecated: true
             text: >
               …
           - name: pretty
             deprecated: false
             text: >
          …    
---

and a related file named eg /content/en/docs/api-extras/coordination.k8s.io/v1/lease.md:

---
content_type: api-reference-extras
title: Lease
sitemap_exclude: true
headless: true
version_support:
  # rough sketch!
  - beta:
      start: v1.14
      # could have end: v1.xx when this is deprecated, not sure the beta Lease API _is_ deprecated yet
  - stable:
      start: v1.17
extraData:
   feature_state: stable
---

A _Lease_ allows a Kubernetes principal to request and then hold a time-limited lock or reservation
against a related object.

## Uses

Each Node has an associated Lease object in the `kube-node-lease`
{{< glossary_tooltip term_id="namespace" text="namespace">}}.
Lease is a lightweight resource, which improves the performance
of the node heartbeats as the cluster scales.

&hellip;

## {{% heading "whatsnext" %}}
<!-- more stuff -->

note that if the extras file isn't there, the API documentation still renders. In this sketch, the optional extras file provides a way for humans to add details that aren't generatable and provide relevant outbound hyperlinks. In the future, some of the things I've shown being in the extras file could be moved into k/kubernetes and then become part of the generated file instead.

and here's an example of using the shortcode:

For more information on the resources you can use to expose your application, see:
- {{% api-reference target="v1/Service" length="short" %}}
- {{% api-reference target="v1/Ingress" api_group="networking.k8s.io" length="short" %}}

Another option is to use data templates. I picked headless bundles because data templates aren't easy to localize.

kbhawkey commented 3 years ago

My points are as follows:

* If we decide to go down this direction, we will still need some kind of navigation scheme (i.e. subdirectories) and asset (i.e. the _index.md files). If we can improve the navigation bar on the left for our current page, this can be easily solved.

* You may have noticed that there are about 800 definitions in 1.19. If we want to generate markdowns only for resources, we will need to make each of those pages self-contained. For example, you will need to replicate "`meta.v1.ObjectMeta`" to many of these pages.

* Converging to the same look & feel is nice to have. I'm a little bit confused about the actual gain we will get.

* Consolidating all definitions referenced by one resource into a single self-contained schema is attractive. If crafted carefully, we will save user's time on traversing the definition hierarchy.

Yes, without evaluating the complexity, I think it makes sense to in-line or keep a resource and its dependencies together.
If readers want a list of all definitions, you could imagine creating a list of the definitions on a separate page (exclusive of navigating the resources).
To make the pieces of the API resource page reusable, if possible, create a number of shortcodes to generate the specific tables on the resource API page from JSON spec snippets (may be nested). Investigate the Hugo openapi3.Unmarshal function. Create shortcodes for generating the other tables (operations, status codes, examples, request, response). I don't think it makes sense to convert the original JSON spec snippets into another data format such as YAML unless there is a good reason to do so.

tengqm commented 3 years ago

Providing a catalog to all definitions is a good idea! There is only one place where nesting JSON spec automatically might be problematic. In the apiextensions.k8s.io/v1 group, there is a definition named CustomResourceValidation that references JSONSchemaProps. The JSONSchemaProps references itself in its definition. We will be trapped into an infinite loop there when parsing the schema. This is one of the places where "careful crafting" should applied.

Converting these JSON data to another format just for the sake of making them conform to Hugo/Docsy style markups doesn't make a lot of senses to me. We have quite some items where are more urgent, e.g. highlighting deprecated resources and fields, better navigation among resources and definitions, supporting unpublished APIs, documenting non-resource API paths etc.

feloy commented 3 years ago

I would like not to stick too much to Hugo and/or Docsy for the generation of the contents. I would prefer Markdown which is standard nowadays for displaying documentation, or HTML.

I've worked earlier this year on a tool that generates self-contained Markdown for Resources. The sources of the tools are at https://github.com/feloy/kubernetes-resources-reference and the final result is visible at https://k8sref.io.

The intermediate files in Markdown can be viewed at https://github.com/feloy/kubernetes-resources-reference/tree/master/website/content/en/docs.

Here the list of features I implemented:

the resources are categorized, here in the categories Workloads, Services, Config&Storage, Authn, Authz, Policies, Extend, Cluster. This is configurable with a simple toc.yaml (https://github.com/feloy/kubernetes-resources-reference/blob/master/config/v1.19/toc.yaml)
each page displays at the first level the associated resources, ex Pod, PodSpec, PodStatus, PodList
each resource inlines its definitions (breaking the recursion for JSONSchemaProps)
some widely used definitions are referenced from another page (ex ObjectMeta)
required fields are indicated, and placed first
fields of a resource can be categorized and ordered, with the help of a fields.yaml file (https://github.com/feloy/kubernetes-resources-reference/blob/master/config/v1.19/fields.yaml)
maps fields are indicated (ex pod.spec.nodeSelector is map[string]string, instead of object) using the value of x-kubernetes-list-type
patch strategies are indicated
apiVersion and kind display the value, not the string type
on top of the page, the Go import necessary to use these resources from a Go program is displayed

Do you think it would be useful to start from this tool, and adapt it to make Markdown compatible with Hugo/Docsy? Or adapt it to output HTML?

tengqm commented 3 years ago

@feloy At least to me, all items in your list of features are valuable. When comparing what you in your tools project to what we get from the gen-apidocs output, the outstanding differences are:

What we have is a big HTML, you broke it into smaller pieces, one per resource. Note that before we publish the big HTML, we have small pieces as well. The big HTML was intentionally merged for ease of publication. If we agree on a scheme to publish a collection of smaller pages, we can do it with trivial effort. Just skip the final merge stage in the generator would do it.
What we have today is HTML, you are instead generating MD files. Well, this is debatable. One benefit of having markdowns is that markdowns can be further processed into different outputs, e.g. HTMLs, PDFs, RTFs, etc. We chose to generate HTMLs directly because the only downstream consumer for the API generator today is Kubernetes website, which serves HTMLs only. The website is now generating HTMLs using Hugo, which emerges as a popular generator not long ago.

I know the look and feel of the API ref page can be improved; the navigation and organization can be tuned; ... I'd like to see more improvements to the contents when compared to presentation. That is the reason I like your improvements regarding "required fields", "patch strategy", etc.

Am I missing any advantages of having markdowns over HTMLs, supposing that we will break the big HTML to smaller ones?

sftim commented 3 years ago

Taking a big steer from the approach in https://github.com/feloy/kubernetes-resources-reference sounds good to me.

I hope that we'll still use Hugo whether the tool generates Markdown pages or HTML fragments - because I'd like to let the reference documentation build upon the navigation, sitemap, etc that Docsy provides. I'm happy it's Markdown. If we want (eg) a PDF, you can render the Markdown to HTML and then render the HTML to PDF. In fact, that would be a nice improvement to track on the backlog.

The different scheme I considered, using Hugo features a lot, offers a few smaller benefits for the longer term. The good news is that I think the two approaches are compatible: nothing about https://github.com/kubernetes-sigs/reference-docs/issues/173#issuecomment-707504010 has any blocker for adding extra metadata, incrementally, to a different set of files. (for example: we could eventually move the informal API group description text into k/website, so that it can be localized).

feloy commented 3 years ago

One small advantage I can see in Markdown is that the content of the documentation extracted from the OpenAPI spec can be inserted as is, and it will be considered as Markdown by Hugo. It seems that lots of documentation are using Markdown markup (back-quotes, etc), links are generated for URLs, etc

sftim commented 3 years ago

It seems that lots of documentation are using Markdown markup (back-quotes, etc), links are generated for URLs, etc.

We should be a bit wary of that. Golang doesn't have any language-level convention for which bits of documentation can be treated as Markdown (it's different in, eg, Rust, where doc comments are defined to be CommonMark). Unless we're linting upstream code PRs for validity of the Markdown, I would prefer not to assume that every bit of the API code is getting that right.

What we can do is use a heuristic initially and then maybe add in some CI into https://github.com/kubernetes/kubernetes that helps reviewers there make sure no regressions creep in. Maybe track some tech debt around adding that CI, too.

kbhawkey commented 3 years ago

I would like not to stick too much to Hugo and/or Docsy for the generation of the contents. I would prefer Markdown which is standard nowadays for displaying documentation, or HTML.

I've worked earlier this year on a tool that generates self-contained Markdown for Resources. The sources of the tools are at https://github.com/feloy/kubernetes-resources-reference and the final result is visible at https://k8sref.io.

It looks like you already have Markdown files generating -- which is nice to build off of. I think it makes sense to leave the files in Markdown and hand off the pages to other tools to publish to different formats (if that is a goal). Presumably, you will want to continue to publish your site from this code or fork off this code.

I am more concerned about the granularity of the pages. Some pages are quite large and flat.
Does it makes sense to break up a large page into a navigable sub-tree or continue to rely up on the built-in right hand sidebar TOC?
Do you want the user to navigate out of the page to the common definitions/common parameters? This display requires a lot of navigation for the reader.
I don't think you can rely upon valid Markdown from the resource type's comments/descriptions.
What could you do differently or what could be improved?

feloy commented 3 years ago

I think it makes sense to leave the files in Markdown and hand off the pages to other tools to publish to different formats (if that is a goal). Presumably, you will want to continue to publish your site from this code or fork off this code.

The tool supports several backends. I should be able to duplicate the Markdown one and adapt it for this situation.

tengqm commented 3 years ago

In case we want to go back to markdown, https://github.com/kubernetes-sigs/reference-docs/pull/102 was the PR that can be reverted.

feloy commented 3 years ago

I duplicated the Hugo backend of the kubernetes-resources-reference tool to a KWebsite backend and have made small changes to make the links compatible with the k/website.

You can view the result at: https://deploy-preview-23294--kubernetes-io-master-staging.netlify.app/docs/reference/kubernetes-api/ref/

tengqm commented 3 years ago

@feloy Thanks for sharing this. Just spent a few minutes reading the generated docs. I'm a little bit confused by the sections of individual definitions. Take the port specification for a container for example ( https://deploy-preview-23294--kubernetes-io-master-staging.netlify.app/docs/reference/kubernetes-api/ref/workloads/container-/#ports). I'm not clear what the structure of the ports property will be. It tells me it is an array of ContainerPort with no link. A reader may get confused regarding what ContainerPort represents.

Then there is a list of ports.containerPort, ports.hostIP ... causing further confusion. Do you mean ports[*].containerPort and ports[*].hostIP ...? Examples of this include env.valueFrom.configMapKeyRef.optional on the same page.

I know we are striving to flatten the hierarchy of the data structure into a single page, but the current way of documenting it needs some tweaking. No?

feloy commented 3 years ago

@tengqm Yes, I understand the confusion.

Perhaps ports[*].containerPort and ports[*].hostIP would be clearer, or ContainerPort.containerPort and ContainerPort.hostIP.

I've made a new version with some newlines that can help the understanding of the structure. In italics, is displayed the documentation of the inlined definition.

tengqm commented 3 years ago

Screen Shot 2020-10-16 at 10 29 58 AM

kbhawkey commented 3 years ago

I see your point. Some feedback: New: resources (ResourceRequirements) is the field with a nested list of the definition. Seems more difficult to read. May be less difficult if the display of the ResourceRequirements definition could be toggled on or off? Or, could link to the definition. Add the definition at the bottom of the page (list of definitions). In page linking instead of linking to a completely new section. Current: resources is the field and ResourceRequirements is the type (link to that definition). The definition is sometimes listed immediately after the resource or found in a list of definitions in the reference.

feloy commented 3 years ago

This is confusing if we remove the full path of the field.

But if we keep the full path (env.valueFrom.fieldRef and not only fieldRef), I find it easy to know where to place this fieldRef in the tree.

I agree that if structures are too much nested, it is difficult to read. For this, I extracted some very nested parts, like pod affinity.

The extracted definitions are placed in the "common definitions" section. It could be possible to place some of these extracted definitions in the same page if they are specific to a resource.

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

kubernetes-sigs / reference-docs

Splitting the API ref into several Markdown pages #173