microsoft / rushstack

Monorepo for tools developed by the Rush Stack community
https://rushstack.io/
Other
5.93k stars 598 forks source link

[api-extractor] Generate declaration files in-memory as part of extraction #1010

Open yacinehmito opened 5 years ago

yacinehmito commented 5 years ago

This is a feature request.

Proposed change

Currently api-extractor requires that we use the typescript compiler to generate our declarations first. Then, api-extractor reads those declarations and does its magic.

My proposal is instead to run tsc with emitDeclarationOnly as a preliminary step. The files emitted would be written to an in-memory filesystem plugged by api-extractor. Those files would then be read from that in-memory filesystem to follow the extraction process.

I have no opinion on how this behaviour is to be triggered. It can be through the configuration file, the command line or by noticing that the entry file is a .ts file (not a .d.ts). Obviously the current behaviour would stay the default.

Motivation

In the wild, a lot of projects don't use the typescript compiler directly. Some use webpack loaders, rollup plugins, babel or ts-node, but specific tools are irrelevant to the discussion.

In this situation there can be either of two cases:

  1. There is no generation of declaration files (the Applications use case), or
  2. Declaration files are generated with tsc out-of-band on a published directory (the Libraries use case)

Applications

Applications don't usually generate declarations files. However they may still want to use api-extractor as it provides the facilities to automatically generate an internal documentation.

This is how applications that use third-party loaders or transpilers such as the one listed above would do it currently:

  1. Call tsc (with for example emitDeclarationOnly) so that it generates declaration files on a temp directory
  2. Call api-extractor run with the entry in temp
  3. Delete the temp directory (and all the other unused artifacts of api-extractor such as dist/tsdoc-metadata.json)

This is quite involved. With the proposed change, this is how they would do it:

  1. Call api-extractor run with the entry as their main .ts file
  2. Delete unused artifacts of api-extractor

Much simpler.

Libraries

TypeScript libraries already generate declaration files. But they may not want to if the two following conditions are met:

Without api-extractor, tsc is called anyway (probably with emitDeclarationOnly) and the relevant .d.ts file is exposed in typings. However, with api-extractor and dts rollup, the process would look like this:

  1. Call tsc (with for example emitDeclarationOnly) so that it generates declaration files on a temp directory
  2. Call api-extractor run with the entry in temp (publishFolder is set to be the published directory and typings in package.json points to the generated .d.ts file)
  3. Delete the temp directory

This is the same burden as with the first use case. With the proposed change, this is how they would do it:

  1. Call api-extractor run with the entry as their main .ts file (publishFolder is set to be the published directory and typings in package.json points to the generated .d.ts file)

That's it! Just one step, like before.

Making a Pull Request

If you approve this proposal I'd like to try a PR.

The files emitted would be written to an in-memory filesystem plugged by api-extractor

This is what makes sense to me but I suppose that if typescript exposes the proper data structure and if we are able to resolve references through it an actual in-memory filesystem would be overkill.

octogonz commented 5 years ago

This is an interesting idea. Some thoughts:

Otherwise, this proposal seems useful and like a good idea.

yacinehmito commented 5 years ago

Your answer makes me thing that this feature does not really belong to api-extractor. Rather, we should expose a programmatic API so that one can write extensions to third-party tools that oversee compilations. For example a webpack plugin for ts-loader would directly hook itself to the declaration files generated in memory and feed to api-extractor.

I think this is the better approach as that would mean less code to maintain and would solve every single issue you addressed:

For that to work api-extractor must abstract away all filesystem access. It should also behave nicely when a single Extractor instance is used multiple times so that incremental builds don't get too costly (this is a nice-to-have though).

What do you think?

octogonz commented 5 years ago

Sure, I like this idea a lot. 👍

csr632 commented 4 years ago

Since tsc can already run in browser and emit files into memery, I think if we can make api-extractor resolve/load files from memory(depend on a in-memory filesystem), both this issue and https://github.com/microsoft/rushstack/issues/1828 would be resolved.

octogonz commented 4 years ago

The way tsc accomplishes that is by isolating all the filesystem APIs behind an interface, so that it can have a different implementation in a web browser. We also would need to abstract some higher level behaviors such as:

For the TSDoc project, all the config file parsing was moved into tsdoc-config to isolate tsdoc so it can run in a browser. It might make sense to do something similar with API Extractor. The api-extractor-model package (used for reading/writing our API JSON files when generating docs) is already well isolated and could run in a browser (if it doesn't already, I can't remember).

None of this is particularly difficult work, but it would touch a lot of files. If someone wants to work on this we would certainly support it and provide help.

csr632 commented 4 years ago

I would love to help. I think we should create a new package (probably named api-extractor-core?). api-extractor-corenever resolve package.json or api-extractor.json by itself. Instead, it get those from arguments.

Current api-extractor also use Sort, Path, InternalError from @rushstack/node-core-library. They should be rewritten in a browser-friendly way. Do we already have the browser-friendly version? Should we create a new package named @rushstack/js-core-library?

It definitely need a lot of file-moving and refactoring. What do you suggest to start with? I guess the entry point is these lines. If I get these 10 lines works, I can get the apiModel object in browser, which is the core demand for in-browser usage. The apiReport and dtsRollup feature will not be brought into api-extractor-core(for my first try). @octogonz

octogonz commented 4 years ago

@csr632 Before trying to get it exactly right, it might be easier to start with a prototype. Like make a branch that tries to carve out the core API Extractor engine, moving all the non-portable stuff behind an interface. Once you get it running, we can step back and look at what's in the interface, and what uses it, and where the boundary should be.

Then we can come back and make a PR that separates the code for real.

This page explains how to invoke the API Extractor engine from a script. And this script is a good example to study. If you can get those interfaces running in a web browser, you have pretty much the whole functionality of API Extractor.

octogonz commented 4 years ago

Also, it would be good to confirm that the api-extractor-model package runs in a browser. If not, fixing that should probably be the first work item.

csr632 commented 4 years ago

I have ported api-extractor-model into browser. It only needs a little copy&pasting: https://github.com/csr632/api-extractor-core/commit/7c25c367fc621ec2e455a00ed7033d0b11f2bc36. But now I am facing circular reference issue from api-extractor-model. You can checkout the fixture in my repo.

This is warning from rollup:

(!) Circular dependencies
../../packages/api-extractor-model/lib/items/ApiDocumentedItem.js -> ../../packages/api-extractor-model/lib/items/ApiItem.js -> ../../packages/api-extractor-model/lib/mixins/ApiParameterListMixin.js -> ../../packages/api-extractor-model/lib/model/Parameter.js -> ../../packages/api-extractor-model/lib/items/ApiDocumentedItem.js
../../packages/api-extractor-model/lib/items/ApiDeclaredItem.js -> ../../packages/api-extractor-model/lib/items/ApiDocumentedItem.js -> ../../packages/api-extractor-model/lib/items/ApiItem.js -> ../../packages/api-extractor-model/lib/mixins/ApiParameterListMixin.js -> ../../packages/api-extractor-model/lib/items/ApiDeclaredItem.js
../../packages/api-extractor-model/lib/items/ApiItem.js -> ../../packages/api-extractor-model/lib/mixins/ApiItemContainerMixin.js -> ../../packages/api-extractor-model/lib/items/ApiItem.js
...and 22 more

These circular references are causing Uncaught ReferenceError: Cannot access 'ApiDeclaredItem' before initialization error when run in browser.

The circular references lead to this line in bundle, which is an "access before initialization".

Probably because I changed this line into this line. But I don't want to solve circular dependency by using commonjs require, because that is not friendly to browser(and rollup).

@octogonz

octogonz commented 4 years ago

@csr632 I didn't have time to do a full analysis. But starting with ApiItem.ts, I see 3 imports that are clearly pulling in things that depend circularly back on ApiItem:

import { ApiPackage } from '../model/ApiPackage';
import { ApiParameterListMixin } from '../mixins/ApiParameterListMixin';
import { ApiItemContainerMixin } from '../mixins/ApiItemContainerMixin';

ApiPackage is a type-only reference, since it is only used like this:

  public getAssociatedPackage(): ApiPackage | undefined {
    for (let current: ApiItem | undefined = this; current !== undefined; current = current.parent) {
      if (current.kind === ApiItemKind.Package) {
        return current as ApiPackage;
      }
    }
    return undefined;
  }

In TypeScript 3.8 we could make this explicit using import type { ApiPackage } from '../model/ApiPackage';, but the compiler will probably figure it out anyway, so this one probably isn't a problem.

For both ApiParameterListMixin and ApiItemContainerMixin, they seem to be only used in a very narrow way:

            if (ApiParameterListMixin.isBaseClassOf(current)) {
              reversedParts.push('()');
            }

and

  public getMergedSiblings(): ReadonlyArray<ApiItem> {
    const parent: ApiItem | undefined = this._parent;
    if (parent && ApiItemContainerMixin.isBaseClassOf(parent)) {
      return parent._getMergedSiblingsForMember(this);
    }
    return [];
  }

So I think you could solve those by moving the isBaseClassOf() functions into an isolated file that doesn't depend on anything else, and then both ApiItemContainerMixin.ts and ApiItem.ts would import from that file.

Probably because I changed this line into this line. But I don't want to solve circular dependency by using commonjs require, because that is not friendly to browser(and rollup).

  public static deserialize(jsonObject: IApiItemJson, context: DeserializerContext): ApiItem {
    // The Deserializer class is coupled with a ton of other classes, so  we delay loading it
    // to avoid ES5 circular imports.
    // eslint-disable-next-line @typescript-eslint/no-var-requires
    const deserializerModule: typeof import('../model/Deserializer') = require('../model/Deserializer');
    return deserializerModule.Deserializer.deserialize(context, jsonObject);
  }

Hmm.... this problem is harder to solve. Webpack does support delayed imports, but they return promises. They are not synchronous like CommonJS require().

Maybe we could define empty stubs in ApiItem.ts and fix them up later.

octogonz commented 4 years ago

Take a look at this prototype:

https://github.com/microsoft/rushstack/commit/de58e9296ecbb5399c785f910fb5b8733603bf1b

I think a solution like this works reasonably well, and the code is not too crazy for people to understand.

csr632 commented 4 years ago

@octogonz I have found a great post to solve circular dependency: https://medium.com/visual-development/how-to-fix-nasty-circular-dependency-issues-once-and-for-all-in-javascript-typescript-a04c987cf0de

We could create can _internal.ts which re-export everything from relavant modules. If one module depend on another, import it from _internal.ts.

This pattern can solve the problem because we can control the module eval order in _internal.ts. For example, since ApiItem should be evaluate before Deserializer, we can create an _internal.ts like this:

// import order matters!
export * from './items/ApiItem.ts'
export * from './model/DeserializerContext'
// re-export other relavant modules......

If some module depend on ApiItem, import it from _internal.ts.

This way we can avoid large refactor. We only need to modify some import statement.

csr632 commented 4 years ago

@octogonz

https://github.com/csr632/api-extractor-core I have solved the "access 'xxx' before initialization" problem and make api-extractor-model works in browser. You can checkout the fixture.

vjpr commented 3 years ago

Has anyone made any progress on this?

From my research:

You can write the DTS files in-memory by following this: https://github.com/Microsoft/TypeScript/wiki/Using-the-Compiler-API#getting-the-dts-from-a-javascript-file. (note: change the .js to .ts)

Then all the file operations for api-extractor run through @rushstack/node-core-library#FileSystem. A simple fix would be exposing the FileSystem class used by api-extractor. Then it could be monkey-patched.

A proper solution would involve exposing the instance via dep injection, but would require a lot of plumbing work I suspect.

The proper proper solution would be to make a layer in front of the FileSystem api and expose that.

For now, I am going to try monkey-patch it, and if that fails, then I will use something like mock-fs.