Open ovflowd opened 1 year ago
cc @mhdawson @nodejs/next-10 @Trott
@nodejs/tsc @nodejs/documentation
We'll need to make sure this process doesn't add any work for releasers. (I don't think it would, but writing it here just in case.)
This will also be a good opportunity hopefully to fix our version picker quirks, at least for future versions of Node.js.
I like this a lot, although of course we'll see what kinds of unforeseen practical problems (if any) arise in the course of implementation.
I wonder if 20.x and forward is more realistic than 18.x and forward. I wouldn't complain if we got this working sooner than 20.x though.
Can we try to determine which parts of this can be done incrementally and which need to happen all-at-once? I'm trying to understand how many steps are involved here. (And if it's one big step, that's OK, but of course we'll want to automate everything because keeping the docs in synch with the current version will be an annoying problem otherwise.)
Is the idea that this would work on the current nodejs.org as well as on nodejs.dev or is the vision here that the nodejs.dev tech/build stack replaces what's on nodejs.org and that's a prerequisite for this to work?
Is the idea that this would work on the current nodejs.org as well as on nodejs.dev or is the vision here that the nodejs.dev tech/build stack replaces what's on nodejs.org and that's a prerequisite for this to work?
In theory, it could also work on nodejs.org, as if we enter the topic of "The build process" if we outsource the tooling created on the nodejs.dev repo (which should be pretty much independent of whatever static-framework stuff you use). Yes. A few tweaks would be needed, but in the end, we could reuse the HTML generation part of the existing nodejs/node/tool/doc
.
For nodejs.dev
no extra steps are needed, yet, I would like to outsource the tooling.
Can we try to determine which parts of this can be done incrementally and which need to happen all-at-once? I'm trying to understand how many steps are involved here.
I foresee 4 major steps:
That's it. Basically the migration itself can be mass done safely.
I wonder if 20.x and forward is more realistic than 18.x and forward. I wouldn't complain if we got this working sooner than 20.x though.
Indeed, I was trying to think about retroactively updating till v18, as v18 is the first version of the API docs that are the most Markdown conforming. (I'm referring to the v18 git tree, also on that tree seems like all the doc pages follow the current doc specs, at least for the metadata, hence why migrating at once would be seamless).
I'm going to update the main proposal adding the following missing sections
Really great proposal ! A lot of topics are covered which is really great as this give a good overview of everything that will require some work. Good choice to not address all the subjects here as it would be too long, but good thinking mentioning them here (tooling, i18n...) which will allow easily link the PR
Just a few questions:
Following @Trott comments I would agree that v20 would be the best time to have it. Will be short for the others version before that. But do we want to provide a retroactive doc for stuff before v20 ? if yes which version ? should we have all the LTS covered ?
A lot of question from my side :)
Build process: include a way to generate the doc from source? To generate part of doc ? Generate whole doc ? Pdf too ? (Maybe better to discuss about that when we will talk about the tooling ?)
As I mentioned before, the building tools will allow you to build just a subset of files if you want. I don't think HTML, PDF and JSON generation should be part of the core of the tooling, but could be added on top of it such as:
import docTooling ....
const result docTooling.generateDocs();
return myPdfLibrary...
We could add all kinds of output generation on top, but the core tooling is responsible for creating a JavaScript object tree with the "metadata" and content aggregated. Initially, the idea is to be a JSX Buffer (MDX), but we could also just return the result into a JavaScript object with the metadata and content. And then have a plugin that generates to MDX, as, for example, we would have for HTML, PDF, JSON...
E.g. (Of the object) for the promises
module:
{
"promises": {
... all the metadata fields,
details: "the content from the Markdown file",
}
}
Versioning doc: keep all the versions accessible on the website ? How to easily update across multiple versions ? Doc on odd version or just even ? or all ?
This is not a responsibility for this proposal.
I am not against yaml but why not have directly the json and not the yaml ? is there some technical stuff blocking us from that ? or is it DX related ?
YAML is more accessible to write than JSON and easier to read. Also less overhead on the transition period. JSON is just a JavaScript object, is not really human friendly (to a certain point) (IMHO)
maybe on the tooling part, we should add / ensure full compliance of the doc ? Way to tests if the heading Id exists for example
If it is not compliant, it wouldn't even build (give an error), but this should not be a responsibility of the tooling; it could be part of the build process by using tools such as Remark, and ESLint, for example.
YAML is more accessible to write than JSON and easier to read
I think that's debatable, YAML can be very hard for humans as well (e.g. multiline strings is non-intuitive, the type guessing makes it that sometimes one mistakes a string for a number, etc.). Other markup languages, such as e.g. TOML or JSON, do not have those problems. I'm not saying those are deal breakers for using YAML, or that we should not consider YAML for this use-case, but I think we should not disregard the problems of that syntax.
l (e.g. multiline strings is non-intuitive, the type guessing makes it that sometimes one mistakes a string for a number, etc.).
Gladly that none of those apply to our schema 😛
Other markup languages, such as e.g. TOML or JSON, do not have those problems. I'm not saying those are deal breakers for using YAML, or that we should not consider YAML for this use-case, but I think we should not disregard the syntax problems.
Every markup language has its pros-and-cons. I just personally (please take it with a grain of salt) belive that, in this case, the pros of using YAML are better.
Thanks for comprehensive proposal !
I think this
Example of a folder structure with all files
and will denfinitely help me understsand/consume what you are suggesting.
@mhdawson @Trott I just updated it :)
Friendly bump for @mhdawson @Trott so we can proceed with the next steps of this proposal :D
It seems like the "move the YAML to a separate file" part can happen pretty much at any time as long as someone is willing to update the relevant tooling. Would it be beneficial to do this right away so that there's one less structural change to make the rest of this proposal happen?
It seems like the "move the YAML to a separate file"
Hmm, the way how the YAML is structured right now in the Markdown, it would possibly have no benefits in extracting it. At least to a certain degree the proposed YAML structure needs to be implemented.
I also think I got tasked in making a demo repository with example contents 🤔
@ovflowd we had discussed an example of what the directory would looke like for a single API, is that what you meant about a demo repository with example contents ?
Yup, pretty much!
I had a meeting with @mhdawson, and here's the execution plan for this proposal:
doc/api
) to the proposal format here. This can pretty much be reused from here
Original source: https://docs.google.com/document/d/1pRa7mqfW79Hc_gDoQCmjjVZ_q9dyc2i7spUzpZ1CW5k
@mhdawson I'm going to proceed with the demo (example) (mentioned here https://github.com/nodejs/next-10/issues/166#issuecomment-1322363051) very possibly during December.
Following the discussion during the last next-10 meeting, it could be great to create another meeting / discussion channel and only keep the update during the next-10 meeting. This topic being really complexe and having a lot of impact it will take and "block" others globals topic. What do you think @ovflowd ? Also because you are leading this initiative when would be the best time for you ? (we can discuss it on slack it could be easier)
Once the demo is in place, I'll get a presentation to the TSC onto the TSC agenda, likely at a meeting in Jan.
@ovflowd ? Also because you are leading this initiative when would be the best time for you ? (we can discuss it on slack it could be easier)
Hmm, let's talk about this on the next Next-10 meeting so we can get in sync about this! :D
Ok great, but I don't see what it brings compared to the docs on nodejs.dev? except more files to manage
Ok great, but I don't see what it brings compared to the docs on nodejs.dev? except more files to manage
I don't want to sound rude, but I think you lack the context behind this proposal 🤔
The API Docs you see on https://nodejs.dev are generated through a script that processes the source API Documentation files. This proposal aims to address several long-standing issues from those files that are the source of the documentation.
And to answer your question, yes, there are more files to manage. The pros-cons are all outlined on the proposal.
What I meant was that if we wrote (on nodejs/node) like on nodejs.dev wouldn't it be easier?
And you're not rude at all
What I meant was that if we wrote (on nodejs/node) like on nodejs.dev wouldn't it be easier?
Nope, it wouldn't be easier at all. The current files on Nodejs.dev are "generated" ones. The meaning of generated being, that they're generated to be compatible with a technology we use called MDX. Think about them as "output of a build system". They're no improvement at all for the Developer Experience of the average contributor of Node.js
I didn't see it as an mdx file. So I validate your idea!
Ok great, but I don't see what it brings compared to the docs on nodejs.dev?
Anything that requires core developers to have to go to a different repo to see what doc changes will look like is a dealbreaker. Anything that requires more work for core developers to validate documentation changes than they do right now is a dealbreaker.
So, if you're suggesting "move the nodejs.dev documentation generation process to core and then core devs can run make doc-only
like they do now and see what the website will look like", then sure, that's a possibility.
But if you're suggesting that the website have a different process to generate docs than core, and that the docs on the website look different from core unless core devs take an additional step, that's not going to work.
@sheplu @mhdawson here's the repository containing an "example" of how the metadata proposal would look like https://github.com/ovflowd/node-doc-proposal
@nodejs/crowdin-managers What do you think of this change, how will it impact crowdin?
@AugustinMauroy this has nothing to do with Crowdin...
@ovflowd The question was to know if the structure modification will work with the Crowdin tool
I repeat myself, this has nothing to do with Crowdin.
Crowdin is not even used for Node.js API docs. And I don't see an easy way of implementing it, neither if we should for the time being. Also the Crowdin managers (the people you pinged) only manage the instance.
For your information nodejs have an Crowdin for Api docs but the GitHub integration was broken.
For your information nodejs have an Crowdin for Api docs but the GitHub integration was broken.
We might have a "group" inside Crowdin, but API Docs were never integrated with Crowdin. I'm quite sure about that, but of course, I could be wrong. Still, this is off-topic @AugustinMauroy, pretty please, let's stay on-topic here.
Has any thought been given as to how we handle the switchover/migration? In particular how this will affect porting stuff between main and any versions of Node.js on the new system and LTS/older versions of Node.js on the old one? For example, presently when we merge something into LTS the release commit from the LTS release is cherry-picked to main
and that (generally) takes care of updating the "added in" metadata.
@richardlau it was written in one of the comments: https://github.com/nodejs/next-10/issues/166#issuecomment-1327867224
In particular how this will affect porting stuff between main and any versions of Node.js on the new system and LTS/older versions of Node.js on the old one?
As we spoke about, including on Next-10 meetings, the metadata proposal applies only for new versions of Node.js, not going to be ported to old versions of the docs (as this is pointless).
For example, presently when we merge something into LTS the release commit from the LTS release is cherry-picked to main and that (generally) takes care of updating the "added in" metadata.
The idea is to release this proposal on the next LTS version. I'm not sure I got exactly what you're asking here, so it would be nice if you could explain it better :)
@richardlau it was written in one of the comments: #166 (comment)
As we spoke about, including on Next-10 meetings, the metadata proposal applies only for new versions of Node.js, not going to be ported to old versions of the docs (as this is pointless).
For example, presently when we merge something into LTS the release commit from the LTS release is cherry-picked to main and that (generally) takes care of updating the "added in" metadata.
The idea is to release this proposal on the next LTS version. I'm not sure I got exactly what you're asking here, so it would be nice if you could explain it better :)
@ovflowd I mean that we frequently port things between releases and the main branch. Maybe examples will make this clearer: e.g.
main
and backporting them to older versions.main
to add the changelogs and update the doc metadata. I really want to minimise any additional work releasers have to do.If the metadata is now in different formats between the branches being picked from and to, that's extra work to convert between the formats.
Backports, e.g. https://github.com/nodejs/node/pull/44976. This is taking commits from main and backporting them to older versions.
Well, in this case, the docs of the change on main
when backported through cherry-pick will of course need a during-cherry-pick edit (like as when you do interative rebase).
It is the pain of transitioning from one standard to another and due to the docs being coupled to the commit of the change itself. I can imagine this will not be often, and as we move forward all the "backported" and "forward-ported" versions will use the new metadata proposal.
This is another reason why we want to release this together with a major semver, like v20. Yes if we need to backport or forwardport things to/from v18 we will need to edit the cherry-pick in-time, or possibly have a separate commit for the docs.
If the metadata is now in different formats between the branches being picked from and to, that's extra work to convert between the formats.
I agree, but this is a short term issue as far as I can see.
@ovflowd I'm sorry but I'm afraid this might be a major blocker to the proposal (I apologize but I should have caught it during your presentation on Wednesday).
I agree, but this is a short term issue as far as I can see.
v18 would be the last LTS version containing the previous docs and it goes end of life on April 2025, granted that this proposal lands in time for v20. Unfortunately I don't believe this to be a short term issue.
I'm happy to take some time to chat about it, show how fundamental the backports are to the LTS release lines in our current release model and brainstorm ways to improve that migration story.
Hey @ruyadorno I don't think this will be an issue at all. The way I see it, is that to make backports easy and feasible the tooling that generates the docs from the old (current) API doc format to the new format (from this proposal) should be able to generate it back to the old format.
I'm thinking in something like this:
# generates from the old format to the new one making all the generated files to the out directory
node-api-tool -c node/doc/api/buffers.md -o out/
# generates from the new format back to the old format
node-api-tool -b -c out/module/buffers -o out_old_format/
It's just an example, but this could at least automate the backporting of the old doc format. Note that forward-porting is not an issue because the proposal initially already aims to have tooling for transforming the old format into the new one.
What do you think?
I believe the workflow we need to preserve for backporting is the ability to git cherry-pick
any commit that touches documentation on main
back to prior release lines. We rely on automation and scripts that do this for us. If the proposal results in us hitting a conflict each time we backport a documentation change from main
, and have to manually apply the diff to a different file in the tree, that would be a significant amount of added work for releasers. With potentially ~150 commits per current release, many would touch documentation (particularly the semver minors), so it's a lot of effort to manually apply those changes. And as @RuyAdorno mentions, that divergence would need to be handled until the EOL of Node.js 18.
(Sorry, my understanding is limited, but I believe changing the directory structure would impact our ability to git cherry-pick
back from main
cleanly.)
From the little I've dug in the proposal brings some great benefits (appreciate your efforts @ovflowd!).
Perhaps there's some Git magic/mapping or automation we can create to mitigate that in our tooling, but we'd need to prove it out and have it ready to go. An alternative may(?) be to manually backport/land the proposed new structure to all active release lines at time same time... but that would involve a lot of additional efforts and coordination.
Perhaps there's some Git magic/mapping or automation we can create to mitigate that in our tooling, but we'd need to prove it out and have it ready to go. An alternative may(?) be to manually backport/land the proposed new structure to all active release lines at time same time... but that would involve a lot of additional efforts and coordination.
Well, thanks for your insights! Really Appreciate it. Here are some ideas we can try to plan out:
Let me know what you think :)
- I believe that migrating previous versions of Node.js API docs to the new format can be done, but it depends on how much back we want to go with backporting the changes. Afaik, v16 is the minimum version where all the API Markdown files consistently follow the current (old) format. v14 already has files not following the format, and things get messier the further we go back.
I was just thinking about the timelines, this may be a reasonable option. At the point when Node.js 20 is released the release lines may be in a state where it's managable to only backport the proposal to Node.js 18:
Maintenance releases are typically very small (10-20 commits), so it might be a manageable amount of work to handle the divergence for Node.js 16 for the 5 months until it's EOL in September 2022. Perhaps backporting this proposal only as far back as Node.js 18 is a feasible option.
- If we don't want to migrate older versions, we can still add the "cli" tool I mentioned to the backporting workflow. If we have a workflow that backports docs files, we can inside this (bash script? js script?) make it execute the CLI while doing an interactive cherry-picking, which means, of course, the original commit hash for the cherry-pick will differ, but it would require 0 manual work).
I think I'd need to think about this in more detail and maybe trial it out, but yeah, something like this may work so long as we can keep a handle on the individual/logical commits.
(cc: @nodejs/releasers, perhaps @targos has thoughts)
I agree that we'll need a solution for Node.js 18, and could possibly manage without one for the remainder of Node.js 16 (assuming the change lands for Node.js 20).
I think the best would be to backport the refactor to Node.js 18 (not necessarily at the same time as v20, but we should schedule a release for it). I agree that we don't have to care too much about v14 and v16.
FYI: This Description is Outdated! (Need update)
As discussed in our Collaborator Summit 2022 edition, we discussed a series of proposals within the current way we structure the metadata of our API docs. This proposal will eventually deprecate specific proposed changes here.
Within this issue, we will adhere to naming the proposal as an "API metadata proposal" for all further references.
The API Metadata Proposal
Proposal Demo: https://github.com/ovflowd/node-doc-proposal
Introduction
What is this proposal about? Our API docs currently face a few issues varying from maintainability and organization to the extra tooling required to make it work. These are namely the following:
unified
. Making it harder to debug, update or change how things are doneThere are many other issues within the current API docs, from non-standard conventions to ensure that rules are appropriately made, from maintaining those files to creating sustainable docs that are inclusive for newcomers and well detailed.
The Proposal
This proposal, at its core, boils down to 4 simple changes:
doc/api/modules/fs/promises.metadata.yml
hasdoc/api/modules/fs/promises.en.content.md
Re-structuring the existing file directory
In this proposal, the tree of files gets updated by adopting a node approach (pun intended) for how we structure the files of our API docs and how we name them.
Notably, these are the significant changes:
modules
. Globals, will, for example, reside withinglobals
misc
folder, but this is open for debate as this is not a crucial point.modules
, is the name of themodule
(top-level) import. For example, "File Systems" would be "fs
" Resulting indoc/api/modules/fs
node:fs/promises
would bedoc/api/modules/node/fs/promises
.e.g., doc/api/modules/node/fs/promises/file-handle.yaml
, Whereas for thepromises
import itself, it would bedoc/api/modules/node/fs/promises.yaml
promises
is a folder and in the second a YAML file; that's because we're following a Node approach, just like a Binary-Tree.Accomplishing this change
This can be quickly done by an automated script that will break down files and generate files. Using a script for tree shaking and creating this node approach would, in the best scenarios, work for all the current files existing on our
doc/api
and, worst case scenario 98% of the files, based on the consistency of adoption and how modules are following these patterns.Extracting the metadata
As mentioned before, the Markdown files should be clean from the actual Metadata, only containing the Description, Introduction (when needed), Examples (both for CJS and MJS) and more in-depth details of when this class/method should be used, and external references that might be useful.
Extracting the metadata allows our contributors and maintainers to focus on writing quality documentation and not get lost in the specificities of the metadata.
What happens with the extracted metadata?
It will be added to a dedicated YAML file containing all the metadata of a particular class, for example. (We created a new tooling infrastructure that would facilitate this on being done here.
The metadata structure will be explained in another section below.
The extraction and categorization process can be automated for all modules and classes, reducing (and erasing) the manual work needed to adopt this proposal.
Enforcing the Adoption of best practices
The actual content of the Markdown files will be "enforced" for Documentation reviewers and WGs for specific Node.js parts, possibly by the adoption of this PR.
The Metadata (YAML) schema
Similarly to the existing YAML schema, it would namely be structured as this:
The structure above allows easily to structure and organise the metadata of each method available within a Class and quickly describe the types, return types, parameters and history of a method, Class, or anything related.
I18n and ICU on YAML files
The structure is also I18N friendly, as precise text details that should not be defined within the Markdown file can be easily referenced using the ICU format. These details can be accessed on files that match the same level of a specific module. For the example above, for example,
doc/api/modules/node/fs/promises.en.i18n.json
contains entries that follow the ICU format such as:Specification Table
The table below demonstrates the entire length of the proposed YAML schema.
Note.: All the properties of type
Enum
will have their possible values discussed in the future, as this is just a high-level specification proposal.Top Level Properties
name
String
doc
folder.import
String
stability
Enum
tags
Lang ID
history
Array<History>
methods
Array<Method>
constants
Array<Constant>
source
String
History
type
Enum
pullRequest
String
issue
String
details
Lang ID
versions
Array<String>
when
String
Method
name
String
stability
Enum
tags
Lang ID
history
Array<History>
returns
Array<ReturnType\|Enum>
params
Array<MethodParam>
MethodParam
name
String
optional
Boolean
defaults
Array<ParameterDefault>
types
Array<ParameterType\|Enum>
ReturnType, ParameterType, ParameterDefault
details
Lang ID
type
Enum
Incorporating the Metadata within the Markdown files
As each Class has numerous methods (possibly constants) and more, the parser needs to know where to attach the data within the final generated result when, for example, building for the web.
This would be quickly done by using Markdown compatible Heading IDs
The parser would map the Heading IDs to each YAML entry's
name
fields to the associated Heading ID. Allowing you to write the Heading as you wish by still keeping the Heading ID intact.Naming for Markdown files
To ensure that we have a 1:1 mapping between YAML and Markdown, the Markdown files should reside in the same folder as the YAML ones and have the same name, the only difference being the Markdown files have the
.md
extension in lowercase. They're suffixed by their languages e.g..en.md
.Note.: By default, the Markdown files will default to
.en.md
extension.The Build Process
Generating the final result in a tangible readable format for Humans and IDE's is no easy feat.
The new tooling build process would consist of two different outputs:
Example of the file structure
An essential factor in easing the visualization of how this proposal would change the current folder structure is to show an example of how it would look with all the changes applied. The snippet below is an illustration of how it would look.
Note.: The root directory below would be
doc/api
.The Navigation (Markdown) Schema
Navigating through the API docs is as essential as displaying the content correctly. The idea here is to allow each
module
to define its Navigation entries and then generate the whole Navigation by aggregating all the navigation files.Book of Rules for the Navigation System
navigation.md
)build-docs --navigation-entry=doc/api/v18/navigation.md
)Note.: The Navigation source would be on Markdown, using a Markdown List format with a maximum of X-indentation levels.
The Schema of Navigation
The code snippet below shows all examples of the Schema and how it would be generated in the end.
File:
doc/api/v18/en.navigation.md
File:
doc/api/v18/modules/en.navigation.md
File:
doc/api/v18/modules/fs/en.navigation.md
Example output in Markdown
It is essential to mention that the final output of the Navigation would be Markdown and can be used by the build tools to either generate an output on MDX or plain HTML or JSON.
Conclusion
As explained before, the proposal has several benefits and would significantly add to our Codebase. Knowing that the benefits vary from tooling, build process, maintainability, adoption, ease of documentation, translations, and even more, this proposal is faded to succeed! Also, all the items explained here can be automated, ensuring a smooth transition process.