split out docgen specific code from compiler

timotheecour commented 3 years ago

proposal

nim jsondoc -o:foo.json [options] main runs runnableExamples and produces foo.json
nimdoc --mode:html|pdf|latex|tex --outdir:htmldocs foo.json produces actual docs
nim doc [options] main calls nim jsondoc and then nimdoc with appropriate options by shelling out, to reproduce pre-existing behavior

=> no code breaks, it'd be a transparent change for users => compiler logic is cleansed from RST parsing etc => no code duplication either

benefits

separation of concerns
gives more freedom on what nim doc can use, eg karax to add more dynamic functionality to docs

links

https://github.com/nim-lang/Nim/pull/15999

earlier versions of this proposal in comments:

Araq commented 3 years ago

We need to decide how JSON can encode the "click on type definition" links and RST markup. I'd still prefer a semi-structured lossless XML format instead of JSON, simply because JSON is inadequate but I've said it before and nobody listens.

timotheecour commented 3 years ago

We need to decide how JSON can encode the "click on type definition" links and RST markup

not a complete answer but this is a start:

map each PSym visited by nim doc into a compact json representation
the json output is DRY, references to (overloaded or not) symbols (eg fn) are done exclusively via a unique id (eg fn@1), never by name (eg fn)
unique id can simply reuse the unique id (or mangled name) that nim generates for each symbol (id is a unique but transient implementation detail)
click on type definition link is encoded once per symbol even if symbol is reused in several places (eg type Foo reappears in multiple contexts)
this allows docgen to generate clickable links everywhere
RST markup is separate tool's responsability; json only encodes raw doc string 1:1
nim doc output is a flat list of symbols (+ maybe other things); instead of symbols from a module being nested inside modules, we lay out all symbols (modules, types, etc) into 1 large flat list

# in main.nim
import mymod
proc fn(a: Foo): Bar
proc fn(a: Foo, b: int) # overload

{
  "symbols": {
    "main@1": {
      "name": "main",
      "kind": "module",
      "imports": ["mymod@1"],
      "exports": null,
      "symbols": ["fn@1", "fn@2"],
    },
    "Foo@1": {
      "name": "Foo@1",
      "kind": "type",
      "module": "mymod@1",
      "loc": {"file": "/pathto/myincludefile.nim", "line": 10, "col": 11},
    },
    "fn@1": {
      "name": "fn",
      "kind": "proc",
      "args": [{"name": "a", "type": "Foo@1"}],
      "returnType": "Bar",
    },
    "fn@2": {
      "kind": "proc",
      "name": "fn",
      "args": [{"name": "a", "type": "Foo@1"}, {"name": "b", "type": "int@1"}],
      "returnType": null,
    }
  }
}

I'd still prefer a semi-structured lossless XML format instead of JSON

you'd need to explain how json is inadequate in this context. json allows unique ids as references (aka pointers) so you can encode arbitrary cyclic data structure such as compiler's known PSym symbols.

minor notes

"module" field is needed because loc.file could point to an include file whose filename differs from the owner module.
I've only shown module+loc for Foo@1, but each symbol would have those fields

Araq commented 3 years ago

So ... instead of importing the Nim compiler and access the AST after sem'check, you serialize the AST into a JSON format. And with IC we have the same but as a good binary format, either way you need a library to de-serialize, I'm not sure I like the overlap.

Araq commented 3 years ago

What we should really do is to make nim doc delegate to a new tool called nimdoc and this tool imports the compiler API much like nimsuggest does it. Otherwise it's an invitation to inferior documentation tools ("Yes, it doesn't link types to their definitions and still ignores parts of the exposed JSON but it uses MegaMarkdown which is superior (and only works on Unix)") and we get annoying ecosystem splits with markdown dialects.

nimdoc can have more dependencies like Karax etc. Best of both worlds.

juancarlospaco commented 3 years ago

Split the Documentation itself from the main repo into a tiny doc-only repo

https://github.com/nim-lang/Nim/tree/devel/doc

More simple and faster CI, can delegate more to the community, no code changes, etc. :)

timotheecour commented 3 years ago

More simple and faster CI, can delegate more to the community, no code changes, etc.

I don't think splitting into a different repo is a good idea, it causes complications when a PR needs to update both the nim repo and the doc repo (ditto when it gets reverted); not to mention versioning: would it stay in sync or not.

I also don't think faster CI is a valid argument, CI for that repo would need to run anyways (and nim docs CI already runs in a separate github action, which is fast/not a bottleneck compared to main CI in azure pipelines)

juancarlospaco commented 3 years ago

I don't think splitting docgen into a different repo is a good idea, is already hard to have all stuff documented already.

If we had a big excess of documentation it would be different but I dont think thats the case...

nim-lang / RFCs