[api-extractor] Generate declaration files in-memory as part of extraction

yacinehmito commented 5 years ago

This is a feature request.

Proposed change

Currently api-extractor requires that we use the typescript compiler to generate our declarations first. Then, api-extractor reads those declarations and does its magic.

My proposal is instead to run tsc with emitDeclarationOnly as a preliminary step. The files emitted would be written to an in-memory filesystem plugged by api-extractor. Those files would then be read from that in-memory filesystem to follow the extraction process.

I have no opinion on how this behaviour is to be triggered. It can be through the configuration file, the command line or by noticing that the entry file is a .ts file (not a .d.ts). Obviously the current behaviour would stay the default.

Motivation

In the wild, a lot of projects don't use the typescript compiler directly. Some use webpack loaders, rollup plugins, babel or ts-node, but specific tools are irrelevant to the discussion.

In this situation there can be either of two cases:

There is no generation of declaration files (the Applications use case), or
Declaration files are generated with tsc out-of-band on a published directory (the Libraries use case)

Applications

Applications don't usually generate declarations files. However they may still want to use api-extractor as it provides the facilities to automatically generate an internal documentation.

This is how applications that use third-party loaders or transpilers such as the one listed above would do it currently:

Call tsc (with for example emitDeclarationOnly) so that it generates declaration files on a temp directory
Call api-extractor run with the entry in temp
Delete the temp directory (and all the other unused artifacts of api-extractor such as dist/tsdoc-metadata.json)

This is quite involved. With the proposed change, this is how they would do it:

Call api-extractor run with the entry as their main .ts file
Delete unused artifacts of api-extractor

Much simpler.

Libraries

TypeScript libraries already generate declaration files. But they may not want to if the two following conditions are met:

TypeScript is not directly used to generate the distributed bundles
The project would benefit from dts rollup

Without api-extractor, tsc is called anyway (probably with emitDeclarationOnly) and the relevant .d.ts file is exposed in typings. However, with api-extractor and dts rollup, the process would look like this:

Call tsc (with for example emitDeclarationOnly) so that it generates declaration files on a temp directory
Call api-extractor run with the entry in temp (publishFolder is set to be the published directory and typings in package.json points to the generated .d.ts file)
Delete the temp directory

This is the same burden as with the first use case. With the proposed change, this is how they would do it:

Call api-extractor run with the entry as their main .ts file (publishFolder is set to be the published directory and typings in package.json points to the generated .d.ts file)

That's it! Just one step, like before.

Making a Pull Request

If you approve this proposal I'd like to try a PR.

The files emitted would be written to an in-memory filesystem plugged by api-extractor

This is what makes sense to me but I suppose that if typescript exposes the proper data structure and if we are able to resolve references through it an actual in-memory filesystem would be overkill.

octogonz commented 5 years ago

This is an interesting idea. Some thoughts:

Perf: It has the downside of invoking the entire tsc analysis a second time (on the full set of source files, not just .d.ts files), which could double the time spent in that build step. For larger projects (or monorepos containing lots of smaller projects), this might increase build times significantly. For smaller projects the effect will probably be negligible.
Compiler version compatibility: Another consideration is that API Extractor's analysis uses a specific version of the TypeScript compiler, which is typically different from the version used by build scripts to compile the source files. This generally doesn't cause problems because .d.ts inputs are already in a normalized form (intended for consumption by a foreign compiler version) whereas incompatibilities are much more likely to occur when compiling the full set of source files using a different version. So if we want API Extractor to invoke tsc itself, ideally this step should discover and use the project's local TypeScript package, not the one that is installed with API Extractor. For simple setups this can be found under my-project/node_modules/typescript, but e.g. that is not the case when using a rig like rush-stack which follows the tsconfig.json "extends" field to find the compiler. We could accommodate this by providing a config option for specifying the typescript package to use for this preprocessing step. (Note that API Extractor's analysis step should always use its specified compiler version, since the compiler API frequently has breaking changes.)
Debugging: When you can't see any of the intermediate files, it can be more difficult to diagnose problems. So we'd probably also want to provide a config option for diagnostic purposes, that writes the .d.ts files to disk instead of the in-memory file system.

Otherwise, this proposal seems useful and like a good idea.

yacinehmito commented 5 years ago

Your answer makes me thing that this feature does not really belong to api-extractor. Rather, we should expose a programmatic API so that one can write extensions to third-party tools that oversee compilations. For example a webpack plugin for ts-loader would directly hook itself to the declaration files generated in memory and feed to api-extractor.

I think this is the better approach as that would mean less code to maintain and would solve every single issue you addressed:

Perf: We don't have to compile the source file. We'll just access the declarations files already generated, but not yet written to the filesystem.
Compiler version compatibility: No issue on that front either. Works like before.
Debugging: The third-party plugins would have a flag that determines whether the original declaration files should be written to disc or discarded. By itself api-extractor wouldn't touch anything.

For that to work api-extractor must abstract away all filesystem access. It should also behave nicely when a single Extractor instance is used multiple times so that incremental builds don't get too costly (this is a nice-to-have though).

What do you think?

octogonz commented 5 years ago

Sure, I like this idea a lot. 👍

csr632 commented 4 years ago

Since tsc can already run in browser and emit files into memery, I think if we can make api-extractor resolve/load files from memory(depend on a in-memory filesystem), both this issue and https://github.com/microsoft/rushstack/issues/1828 would be resolved.

octogonz commented 4 years ago

The way tsc accomplishes that is by isolating all the filesystem APIs behind an interface, so that it can have a different implementation in a web browser. We also would need to abstract some higher level behaviors such as:

API Extractor walks upwards from a source file to discover the folder containing package.json, and then probes for config files around there.
API Extractor uses the resolve package to do node_modules resolution when following the extends field in api-extractor.json
API Extractor does CLI parsing using ts-command-line which relies on a lot of Node.js APIs

For the TSDoc project, all the config file parsing was moved into tsdoc-config to isolate tsdoc so it can run in a browser. It might make sense to do something similar with API Extractor. The api-extractor-model package (used for reading/writing our API JSON files when generating docs) is already well isolated and could run in a browser (if it doesn't already, I can't remember).

None of this is particularly difficult work, but it would touch a lot of files. If someone wants to work on this we would certainly support it and provide help.

csr632 commented 4 years ago

I would love to help. I think we should create a new package (probably named api-extractor-core?). api-extractor-corenever resolve package.json or api-extractor.json by itself. Instead, it get those from arguments.

Current api-extractor also use Sort, Path, InternalError from @rushstack/node-core-library. They should be rewritten in a browser-friendly way. Do we already have the browser-friendly version? Should we create a new package named @rushstack/js-core-library?

It definitely need a lot of file-moving and refactoring. What do you suggest to start with? I guess the entry point is these lines. If I get these 10 lines works, I can get the apiModel object in browser, which is the core demand for in-browser usage. The apiReport and dtsRollup feature will not be brought into api-extractor-core(for my first try). @octogonz

octogonz commented 4 years ago

@csr632 Before trying to get it exactly right, it might be easier to start with a prototype. Like make a branch that tries to carve out the core API Extractor engine, moving all the non-portable stuff behind an interface. Once you get it running, we can step back and look at what's in the interface, and what uses it, and where the boundary should be.

Then we can come back and make a PR that separates the code for real.

This page explains how to invoke the API Extractor engine from a script. And this script is a good example to study. If you can get those interfaces running in a web browser, you have pretty much the whole functionality of API Extractor.

octogonz commented 4 years ago

Also, it would be good to confirm that the api-extractor-model package runs in a browser. If not, fixing that should probably be the first work item.

csr632 commented 4 years ago

I have ported api-extractor-model into browser. It only needs a little copy&pasting: https://github.com/csr632/api-extractor-core/commit/7c25c367fc621ec2e455a00ed7033d0b11f2bc36. But now I am facing circular reference issue from api-extractor-model. You can checkout the fixture in my repo.

This is warning from rollup:

(!) Circular dependencies
../../packages/api-extractor-model/lib/items/ApiDocumentedItem.js -> ../../packages/api-extractor-model/lib/items/ApiItem.js -> ../../packages/api-extractor-model/lib/mixins/ApiParameterListMixin.js -> ../../packages/api-extractor-model/lib/model/Parameter.js -> ../../packages/api-extractor-model/lib/items/ApiDocumentedItem.js
../../packages/api-extractor-model/lib/items/ApiDeclaredItem.js -> ../../packages/api-extractor-model/lib/items/ApiDocumentedItem.js -> ../../packages/api-extractor-model/lib/items/ApiItem.js -> ../../packages/api-extractor-model/lib/mixins/ApiParameterListMixin.js -> ../../packages/api-extractor-model/lib/items/ApiDeclaredItem.js
../../packages/api-extractor-model/lib/items/ApiItem.js -> ../../packages/api-extractor-model/lib/mixins/ApiItemContainerMixin.js -> ../../packages/api-extractor-model/lib/items/ApiItem.js
...and 22 more

These circular references are causing Uncaught ReferenceError: Cannot access 'ApiDeclaredItem' before initialization error when run in browser.

The circular references lead to this line in bundle, which is an "access before initialization".

Probably because I changed this line into this line. But I don't want to solve circular dependency by using commonjs require, because that is not friendly to browser(and rollup).

@octogonz

octogonz commented 4 years ago

@csr632 I didn't have time to do a full analysis. But starting with ApiItem.ts, I see 3 imports that are clearly pulling in things that depend circularly back on ApiItem:

import { ApiPackage } from '../model/ApiPackage';
import { ApiParameterListMixin } from '../mixins/ApiParameterListMixin';
import { ApiItemContainerMixin } from '../mixins/ApiItemContainerMixin';

ApiPackage is a type-only reference, since it is only used like this:

  public getAssociatedPackage(): ApiPackage | undefined {
    for (let current: ApiItem | undefined = this; current !== undefined; current = current.parent) {
      if (current.kind === ApiItemKind.Package) {
        return current as ApiPackage;
      }
    }
    return undefined;
  }

In TypeScript 3.8 we could make this explicit using import type { ApiPackage } from '../model/ApiPackage';, but the compiler will probably figure it out anyway, so this one probably isn't a problem.

For both ApiParameterListMixin and ApiItemContainerMixin, they seem to be only used in a very narrow way:

            if (ApiParameterListMixin.isBaseClassOf(current)) {
              reversedParts.push('()');
            }

and

  public getMergedSiblings(): ReadonlyArray<ApiItem> {
    const parent: ApiItem | undefined = this._parent;
    if (parent && ApiItemContainerMixin.isBaseClassOf(parent)) {
      return parent._getMergedSiblingsForMember(this);
    }
    return [];
  }

So I think you could solve those by moving the isBaseClassOf() functions into an isolated file that doesn't depend on anything else, and then both ApiItemContainerMixin.ts and ApiItem.ts would import from that file.

Probably because I changed this line into this line. But I don't want to solve circular dependency by using commonjs require, because that is not friendly to browser(and rollup).

  public static deserialize(jsonObject: IApiItemJson, context: DeserializerContext): ApiItem {
    // The Deserializer class is coupled with a ton of other classes, so  we delay loading it
    // to avoid ES5 circular imports.
    // eslint-disable-next-line @typescript-eslint/no-var-requires
    const deserializerModule: typeof import('../model/Deserializer') = require('../model/Deserializer');
    return deserializerModule.Deserializer.deserialize(context, jsonObject);
  }

Hmm.... this problem is harder to solve. Webpack does support delayed imports, but they return promises. They are not synchronous like CommonJS require().

Maybe we could define empty stubs in ApiItem.ts and fix them up later.

octogonz commented 4 years ago

Take a look at this prototype:

https://github.com/microsoft/rushstack/commit/de58e9296ecbb5399c785f910fb5b8733603bf1b

I think a solution like this works reasonably well, and the code is not too crazy for people to understand.

csr632 commented 4 years ago

@octogonz I have found a great post to solve circular dependency: https://medium.com/visual-development/how-to-fix-nasty-circular-dependency-issues-once-and-for-all-in-javascript-typescript-a04c987cf0de

We could create can _internal.ts which re-export everything from relavant modules. If one module depend on another, import it from _internal.ts.

This pattern can solve the problem because we can control the module eval order in _internal.ts. For example, since ApiItem should be evaluate before Deserializer, we can create an _internal.ts like this:

// import order matters!
export * from './items/ApiItem.ts'
export * from './model/DeserializerContext'
// re-export other relavant modules......

If some module depend on ApiItem, import it from _internal.ts.

This way we can avoid large refactor. We only need to modify some import statement.

csr632 commented 4 years ago

@octogonz

https://github.com/csr632/api-extractor-core I have solved the "access 'xxx' before initialization" problem and make api-extractor-model works in browser. You can checkout the fixture.

vjpr commented 3 years ago

Has anyone made any progress on this?

From my research:

You can write the DTS files in-memory by following this: https://github.com/Microsoft/TypeScript/wiki/Using-the-Compiler-API#getting-the-dts-from-a-javascript-file. (note: change the .js to .ts)

Then all the file operations for api-extractor run through @rushstack/node-core-library#FileSystem. A simple fix would be exposing the FileSystem class used by api-extractor. Then it could be monkey-patched.

A proper solution would involve exposing the instance via dep injection, but would require a lot of plumbing work I suspect.

The proper proper solution would be to make a layer in front of the FileSystem api and expose that.

For now, I am going to try monkey-patch it, and if that fails, then I will use something like mock-fs.

microsoft / rushstack