Retrieving the parsed structures for external processing by third-party tooling

rdw-software commented 2 years ago

Hi,

is there any way to get the parsed structures in a format that can be processed by other applications?

I'd like to use a JSON or Lua table representation of the API to generate content dynamically so that it can be embedded in a static documentation website (created by Docusaurus). While serviceable, the website that luadox generates by default doesn't lend itself well to customization, and I can think of other ways to process the API structures that would be useful to me as well.

I see they're stored in the Parser class, but I'm not sure if the internal layout is appropriate if dumped. Ideally, I would have something along the lines of Blizzard's WOW API documentation that could easily be used to populate React components in Docusaurus, but any standardized format would likely work.

Do you think this would be possible, and if yes what approach would you suggest?

jtackaberry commented 2 years ago

It's certainly technically possible to dump the pre-render stage data structures to JSON. One concern is that this creates an API contract that has an expectation of stability, while for rendered content there's much more latitude for change. So that's a new thing that would require some thought.

My other main question is how to deal with markdown and references? Leave them entirely unparsed/raw in the JSON?

In LuaDox, the renderer takes care of parsing and converting markdown to HTML, and at that time all references (both @{foo} and `foo`) are resolved to hyperlinks. Because at this stage we know what things go in what files and in what sections within those files, and so references can be appropriately resolved to specific anchor tags in specific files.

How do you think this should be handled with JSON renders? Leave them raw/unresolved?

rdw-software commented 2 years ago

Thanks for the quick response! I haven't thought about the design much, but here's a few ideas:

I would simply add a schemaVersion (number) and then consumers can make sure they support the latest one
Markdown should not be modified in any way, since rendering it would be the responsibility of the consumer
References could be transformed into a unique ID/URL-style string, e.g. MyApp.MyModule.HelloWorldFunction or even table<someID> in typical Lua style, depending on the type of the serialized values

If I had a JSON of all the functions, modules, etc. organized by file, I would probably construct URLs based on them so that they fit into the existing website. For example, the MyApp.MyModule.HelloWorldFunction structure could be used to create an entry at https://mydocs.github.io/api/MyModule#HelloWorldFunction or similar (1:1 mapping).

I don't really know how luadox handles this internally so I can't comment on what would work best. But I guess if you consider this an experimental feature you would have plenty of opportunity to iterate after seeing how it turns out in practice :)

pakeke-constructor commented 2 years ago

Has anyone done any more thinking about this? I like this feature, and I'd be willing to implement it.

~~I agree with the whole namespaced id thing, with the MyApp.MyModule.Func stuff.~~ Woops, I misunderstood the issue. Yeah, that is a bit annoying... perhaps we could keep a file value in the json entry that keeps track of the file that each Reference was defined in? That way everything could be namespaced correctly and we wouldn't get collisions. Each pass, we could also assign each Reference object a unique integer id, and we could use that for referencing within the JSON object.

jtackaberry commented 1 year ago

I'm planning to implement this for the next release (LuaDox 2.0). I've begun some refactoring work to enable this (among other things, such as support for other annotation conventions), and in the process have been thinking about how best to approach it.

First, the basic idea is that the JSON structure will reflect a hierarchical layout:

Top-level elements (@module, @classand manual pages)
- Collections within the top-level element (@section and @table)
  - Functions and fields within the collection

Each element will contain an id field that uniquely identifies the element within the project. References within markdown (`symbol` and @{symbol}) will be converted to markdown links where the hyperlink is in the form luadox:<id>. (Thanks @Duckwhale for inspiring that idea.)

In terms of markdown, there are a couple wrinkles that seem to necessitate a bit of extra complexity. LuaDox has some tags that need to be parsed, but yet don't directly map to any markdown. Currently these are @see and the two admonition tags @note and @info.

So I'm thinking about handling this by representing markdown content fields in the JSON as an array instead of a string, where the array would contain a list of objects that represent either a markdown string, or some more complex parsed field such as an admonition.

For example:

{
  "id": "foo.baz",
  "type": "class",
  "content": [
    {
      "markdown": "### Some heading\n\nSome text goes here."
    },
    {
      "type": "admonition",
      "level": "warning",
      "title": "Beware!",
      "content": [
        {
          "type": "Markdown within the admonition that has a @see tag"
        },
        {
          "type": "see",
          "ids": [
            "bar.one",
            "bar.two"
          ]
        }
      ]
    },
    {
      "markdown": "More markdown after the admonition [with a link](luadox:bar.two)"
    }
  ],
  "functions": [
    "stuff goes here"
  ],
  "fields": [
    "stuff goes here"
  ]
}

Or, as yaml, because it'll be trivial to support:

id: foo.baz
type: class
content:
  - markdown: |-
      ### Some heading

      Some text goes here.
  - type: admonition
    level: warning
    title: Beware!
    content:
      - type: Markdown within the admonition that has a @see tag
      - type: see
        ids:
          - bar.one
          - bar.two
  - markdown: More markdown after the admonition [with a link](luadox:bar.two)
functions:
  - stuff goes here
fields:
  - stuff goes here

That isn't a fully baked document, just depicts how a single collection might be represented within the larger document, and how markdown content is split up into an array like that.

Let me know what you think.

rdw-software commented 1 year ago

I can't really comment on the design, but if you have a working prototype I'm happy to give it a spin to get you some feedback :)

Since this would be the input for scripts and other tools, it's probably not too important how the structures are laid out exactly.

jtackaberry commented 1 year ago

This is implemented in master now, if anyone's interested in trying it out.

You can install and run out of master using a pipx editable install:

git clone https://github.com/jtackaberry/luadox.git
pipx install -e luadox/
luadox [...your usual arguments...] -r json

-r (or --renderer) controls how the output is rendered. The yaml renderer is also available (which produces smaller and more readable files but is slower).

The structure isn't documented yet, but hopefully it's obvious enough to figure out. Probably the most counterintuitive thing is that for classes and modules, the sections array includes the class and module itself (as evidenced by the id key) . This is intentional because top-level classes/modules have all the same semantics as sections (they can contain documented content, and fields and functions).

Feedback welcome.

rdw-software commented 1 year ago

I tried to generate yaml and json files to test the new feature, but I'm always getting this error:

/home/rdw/.local/bin/luadox test.lua --renderer yaml
2023-09-22 13:12:39,507 [INFO] parsing /tmp/luadox-json-test/test.lua
2023-09-22 13:12:39,510 [INFO] prerendering 1 pages
2023-09-22 13:12:39,511 [ERROR] unhandled error rendering around /tmp/luadox-json-test/test.lua:-1: No option 'name' in section: 'project'
Traceback (most recent call last):
  File "/usr/lib/python3.11/configparser.py", line 805, in get
    value = d[option]
            ~^^^^^^^^
  File "/usr/lib/python3.11/collections/__init__.py", line 1004, in __getitem__
    return self.__missing__(key)            # support subclasses that define __missing__
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/collections/__init__.py", line 996, in __missing__
    raise KeyError(key)
KeyError: 'name'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/luadox-json-test/luadox/luadox/main.py", line 249, in main
    renderer.render(toprefs, out)
  File "/tmp/luadox-json-test/luadox/luadox/render/yaml.py", line 47, in render
    project = self._generate(toprefs)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/luadox-json-test/luadox/luadox/render/json.py", line 33, in _generate
    name = self.config.get('project', 'name')
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/configparser.py", line 808, in get
    raise NoOptionError(option, section)
configparser.NoOptionError: No option 'name' in section: 'project'

A few observations:

Using --renderer html works (but yaml and json cause the above error)
I tried passing an empty file, the middleclass example, setting --name TEST and adding the example luadox.conf

This was on WSL (Kali Linux). I could test on other systems as well, but it doesn't seem like a platform-specific issue.

jtackaberry commented 1 year ago

@Duckwhale silly oversight on my part, sorry about that. Just committed a fix.

jtackaberry commented 1 year ago

BTW @Duckwhale, a specific renderer for Docusaurus is theoretically possible now, and I'd like to have that capability natively in Luadox. So I'm quite interested in your findings here, and really more generally any advice or thoughts you might have on the subject. I've not used Docusaurus yet (and it certainly generates significantly more polished output than LuaDox's current html renderer :)) so I don't yet have any intuitions on the ideal approach.

rdw-software commented 1 year ago

Just FYI, I've started building a prototype to see if I can use the JSON output to generate something remotely close to my manually-created docs. I've written down a bunch of feedback already, but it'll take some time to get more insights.

One thing that I can say already is that I wanted a way to find out which source file a (top-level entry) originates from. This is so I can add project specific tags that likely wouldn't have to be added to the tool itself, such as "FFI/Unsafe API" or "External", which are useful things to list in a documentation but needn't be custom tags necessarily. Or maybe it's already possible to get this info?

I guess it would be possible to chain find into luadox and then save the file path alongside the output. Pretty awkward, though.

jtackaberry commented 1 year ago

I can definitely add a source key or some such to the top-level entries. Good idea. Looking forward to learning more about your experience with the prototype!

LeighMcRae commented 8 months ago

This is implemented in master now, if anyone's interested in trying it out.

You can install and run out of master using a pipx editable install:
git clone https://github.com/jtackaberry/luadox.git
pipx install -e luadox/
luadox [...your usual arguments...] -r json
-r (or --renderer) controls how the output is rendered. The yaml renderer is also available (which produces smaller and more readable files but is slower).

I was having trouble getting this to run from source. I'm sure it was my lack of python experience. Maybe add this to the front page for other people. It was really useful for me.

jtackaberry / luadox

Retrieving the parsed structures for external processing by third-party tooling #5