Introduce an intermediate representation

erikrose commented 5 years ago

Currently, we program everything directly against JSDoc's JSON output. Even the TypeScript stuff first converts to JSDoc. But JSDoc's JSON is unspecified, idiosyncratic (i.e. https://github.com/mozilla/sphinx-js/blob/2158457eeeafc0baa7aa86033dbe3d9e314223f6/sphinx_js/doclets.py#L31-L35), and not always the clearest thing in the world to read at callsites.

Let's instead write against an intermediate representation that we control.

Advantages:

We can fix any changes JSDoc makes in one place instead of all over the codebase.
We can support information that JSDoc doesn't, as for TypeScript or other frontend languages.
Our TypeScript module should become a lot more maintainable.
Maybe tests will become easier to write.

What bits of info will we need to support in the IR?

filename
file path
longname (to do: define what this is)
line number (for error reporting)
params, with their types, names, default values, descriptions, and whether they're variadic
exceptions a function can raise
function return type and description
examples for functions and classes
whether funcs or classes are deprecated
see-alsos
class comments and class-constructor comments (Should constructor comments be special? Probably; I think we merge them currently.)
members of a class
- what kind of thing they are (fuction, typedef (currently a hack, I believe, around not having a proper representation of typedefs—we just shove them through the fuction renderer))
- access (public, private, protected)
properties (Is this a TS thing?)

That's from my reading of the JS-related source along with a peek at the (handy) docs at the top of typedoc.py about what data are used. Anybody see anything else? The next step is to gin up a class-based API that supports all this.

graup commented 5 years ago

Sounds good so far. We should strive to make this as generic as possible so that we can add new features more easily in the future.

Types are kind of a pain. They need to be nestable and have properties. Perhaps something like

functionTypes = [
 {
    name: 'Promise',
    flags: {
        readonly: true,
    },
    children: [
        {
            name: 'TypeA',
        },
   ],
},
...
]

There are also keywords likeextends, and defaults for generic types, and probably more that I haven't encountered yet. I wonder what's the best way to maintain this, as ts will probably add additional syntax features in the future. Typedoc has a big collection of tests with sources and output json, maybe we can use that.

properties (Is this a TS thing?)

It is, so far, but it's also a stage3 proposal for JS itself: https://github.com/tc39/proposal-class-fields

erikrose commented 5 years ago

I wonder what's the best way to maintain this, as ts will probably add additional syntax features in the future

I had assumed we'd use a class-based object model, with methods behind which we could hide common bits of processing. Then we can always keep adding methods and fields as TS (for example) grows or we grow to support more of its features. I'd definitely represent something as hairy as a type as some kind of method-having object, perhaps containing even more thereof.

erikrose commented 4 years ago

Here's a first-draft sketch of a class-based API for the IR. Commentary welcome. The main new thing is the rough division into classes. Spelling doesn't count yet. These are the must-have properties. They may be augmented by or even hidden behind computed convenience properties later.

Node:
    .name (the short name of a member, regardless of whether it's a class of function or typedef or param)
    .longname (our namepath-like things)
    .is_documented
    .description
    .filename with extension
    .lineno
    .deprecated
    .examples
    .memberof  # Or maybe Class.members suffices.
    .see_also
    .properties

Anything that can be a class member (function, class, attr, anything else?):
    .access (private, public, protected)

Class or Function:  # Whether this ends up being a common superclass is an implementation detail, but the point is that both functions and classes have these fields.
    .params (classes in TS are type-parametrizable)

Function(Node):
    .exceptions
    .returns

Class(Node):
    (.classdesc can go into Node.description)
    Maybe _sphinxjs_doclets_by_class turns into .members.
    Maybe will want a .constructor.

Typedef(Node?):
    .type

Param(Node):  # Does this subclass Node? It has a name, description, filename, lineno. Probably.
    .defaultvalue
    .is_variadic
    .type # Could be an array of types. Types could even be parametrized by other types, but we probably don't care for IR purposes: let whichever analyzer boil them down to strings.

Unneeded in and of themselves:
    .meta.path (just to implement doclet_full_path)
    .kind (function vs. typedef. Can be represented by node class.)

erikrose commented 4 years ago

Alright, I have the IR designed and running on a branch. Here it is: https://github.com/erikrose/sphinx-js/blob/ir/sphinx_js/ir.py. The IR is just a passive pile of structs, as you might expect from something being fed by multiple language Analyzers. I've ported the JS bits of sphinx-js to an Analyzer. The important part of the API is just the get_object() call, which returns an IR object based on a passed in path.

Can I interest anyone in porting the TypeScript support to a similar Analyzer? @graup? @tavianator? All the tests are passing except the TypeScript ones.

graup commented 4 years ago

Cool! One question regarding types. If there's something like Promise<Foo | Bar>, it would be good if the renderer could link the Foo and Bar parts of the string. I guess that's something we can do in the renderer, and having just-a-string for the type is fine? Or is it worth representing the recursive and summative nature of types?

I'd have time in the second half of August to port the typescript part.

erikrose commented 4 years ago

If you want to hyperlink the parts of an algebraic type (which is a wonderful idea), it's worth representing the parts of it in the IR. TypeDoc's output links types to their declarations by ID, so we can get unambiguous links by not throwing away that information. However, the templates themselves (which we inherit from Sphinx) are not ready to receive such links; they assume plaintext in at least the formal parameter lists and also, I strongly suspect, the field lists (the lists of params with their types and descriptions). So, depending on difficulty, I might go either way, considering that substantial template work has to be done before algebraic representation will give any practical benefit. Either way, having a well-defined IR means we can change things later without as much head-scratching.

erikrose commented 4 years ago

I'm 75% done with the TypeScript IR port: https://github.com/erikrose/sphinx-js/blob/ir/sphinx_js/typedoc.py. I'll be throttling back work on it now, since it's eaten more time than I anticipated, but I still plan to keep it moving slowly forward. Let me know if you do get time, and maybe we can team up finishing it.

erikrose commented 4 years ago

@graup @tavianator Can I get an opinion from some of you TypeScript guys? I'm thinking of ditching the "external:" and "module:" strings that currently occur in the TS object pathnames and going with a purely file-based approach as in JS Land: ./some/file.someOldStyleNamespace.someClass#someInstanceMethod. Would you miss them? Maybe nobody even knew about them and everyone was just using shorter suffixes.

tavianator commented 4 years ago

@erikrose I'm not sure what those are exactly, I don't see them in my docs or some other TypeDoc projects. So I won't miss them at least.

erikrose commented 4 years ago

That's great news. They're prefixes that unambiguously say "The following is the name of an ES6 module" or "The following is the name of an 'internal module' (namespace)". It's hard to write an analyzer for a language you've never used, so I appreciate your read on what's normal. :-)

graup commented 4 years ago

I agree that that's a detail that most people won't care about in documentation.

I'm available from Aug 17 to contribute, drop me a line if you want to coordinate. http://twitter.com/graycoding

tavianator commented 4 years ago

Ah okay. I don't really use namespaces, but either way I think as long as you can tell whether an item is in a namespace or a module on its own documentation, it doesn't really need to be part of the path.

erikrose commented 4 years ago

I'm about done with the IR branch, which amounts to a rewrite of the TypeScript support. Would you two mind trying it on your projects before I release it?

https://github.com/erikrose/sphinx-js/tree/ir/

erikrose commented 4 years ago

There are also a lot of changes, summarized under 3.1 at https://github.com/erikrose/sphinx-js/blob/ir/README.rst#version-history.

mozilla / sphinx-js

Introduce an intermediate representation #120