kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
4.02k stars 197 forks source link

JavaScript: change generated API to support circular/out-of-order imports #1074

Closed generalmimon closed 8 months ago

generalmimon commented 12 months ago

tl;dr Exporting a plain object instead a function (as it is now) from generated JS format modules enables circular imports and removes the need to load modules in a specific order within a browser global context (like in the Web IDE), but at the cost of breaking backward compatibility - everyone using KS-generated JS modules will have to adapt their code.

Currently, the JS code generated by KSC is wrapped in a UMD envelope borrowed from https://github.com/umdjs/umd/blob/36fd113/templates/returnExports.js#L17-L37 and overall looks like this:

(function (root, factory) {
  if (typeof define === 'function' && define.amd) {
    define(['kaitai-struct/KaitaiStream', './VlqBase128Le'], factory);
  } else if (typeof module === 'object' && module.exports) {
    module.exports = factory(require('kaitai-struct/KaitaiStream'), require('./VlqBase128Le'));
  } else {
    root.ImportsAbs = factory(root.KaitaiStream, root.VlqBase128Le);
  }
}(typeof self !== 'undefined' ? self : this, function (KaitaiStream, VlqBase128Le) {
var ImportsAbs = (function() {
  function ImportsAbs(_io, _parent, _root) {
    // ...
  }
  ImportsAbs.prototype._read = function() {
    this.len = new VlqBase128Le(this._io, this, null);
    // ...
  }

  return ImportsAbs;
})();
return ImportsAbs;
}));

When this code is executed, it first resolves the dependencies - the KS runtime library and the VlqBase128Le imported spec. For the "browser globals" else case, it fetches whatever is currently under the keys KaitaiStream and VlqBase128Le in the global object (falling back to undefined if they haven't been set), and immediately calls the factory function (the function (KaitaiStream, VlqBase128Le) { ... } one), which receives the resolved dependencies as arguments. The factory function returns what we want to export from the module - in our case, the root "class" constructor function function ImportsAbs(_io, _parent, _root). In other words, typeof root.ImportsAbs will be 'function'.

However, this means that when loading ImportsAbs, its dependency root.VlqBase128Le must be already set on the global object to the correct constructor function that creates a new instance of the VlqBase128Le type. If it's not, the result of this access is the value undefined, which means that once we try using ImportsAbs, it will fail on attempting to use VlqBase128Le as if it was correctly resolved.

  1. This implies that in the browser globals context, we must take extra care to load the specs in the correct order. This is something the Web IDE doesn't do, which results in (fortunately temporary in case of Web IDE) errors like TypeError: {ImportedSpec} is not a constructor, see https://github.com/kaitai-io/kaitai_struct_webide/issues/159 and https://github.com/kaitai-io/kaitai_struct_webide/issues/59.

  2. Another limitation is that circular imports cannot work in any JavaScript environment.

    Node.js, for example, has support for circular require() calls, but the application must consider the implications of this to work correctly, see https://nodejs.org/docs/latest-v20.x/api/modules.html#cycles:

    When there are circular require() calls, a module might not have finished executing when it is returned.

    Careful planning is required to allow cyclic module dependencies to work correctly within an application.

    I understood from https://github.com/kaitai-io/kaitai_struct/issues/337 that we want circular imports to work in Kaitai Struct. Also, https://github.com/kaitai-io/kaitai_struct/issues/691 can help us realize that the module participating in the dependency cycle doesn't have to be another KS-generated format parser, it can also be a custom processor or an opaque type.

    Although it seems circular dependencies are somewhat frowned upon in the programming community and it's mostly advised to avoid them, I think they should work nonetheless. If they don't, it forces you to refactor something just to work around this arbitrary limitation of the system and therefore makes the system harder to work with.

Both of these limitations are due to the fact that the generated format modules export the constructor function directly. You can't have an "unfinished" constructor function that could be finished later - once you "distribute" a function, you cannot mutate its body. In contrast, you can distribute an empty object and only set additional properties on it later - this is the crucial principle that enables circular imports in JavaScript in general.

The fact that the variant of the UMD envelope we're currently using (https://github.com/umdjs/umd/blob/36fd113/templates/returnExports.js#L17-L37) doesn't support circular imports is mentioned in its header comment - see returnExports.js:1-4:

// Uses Node, AMD or browser globals to create a module.

// If you want something that will work in other stricter CommonJS environments,
// or if you need to create a circular dependency, see commonJsStrict.js

Therefore, to remove these limitations of the current UMD envelope, I suggest adapting the commonJsStrict.js template instead:

(function (root, factory) {
    if (typeof define === 'function' && define.amd) {
        // AMD. Register as an anonymous module.
        define(['exports', 'b'], factory);
    } else if (typeof exports === 'object' && typeof exports.nodeName !== 'string') {
        // CommonJS
        factory(exports, require('b'));
    } else {
        // Browser globals
        factory((root.commonJsStrict = {}), root.b);
    }
}(typeof self !== 'undefined' ? self : this, function (exports, b) {
    // Use b in some fashion.

    // attach properties to the exports object to define
    // the exported module properties.
    exports.action = function () {};
}));

AMD and CommonJS cases will already work as expected (even in the case of circular imports), but unfortunately, the "Browser globals" apparently still won't work for circular dependencies or when the modules are loaded in an incorrect order. As in the returnExports.js we've been using so far, the root.b access requires that the b dependency is already available in the global scope at the time of module inclusion. However, it's quite easy to avoid this drawback:

     } else {
         // Browser globals
-        factory((root.commonJsStrict = {}), root.b);
+        factory(root.commonJsStrict || (root.commonJsStrict = {}), root.b || (root.b = {}));
     }

This ultimately solves both the circular import problem and out-of-order loading problem. Provided that the module b also follows this pattern, you can load the modules commonJsStrict and b in any order relative to each other, and there's no problem even if the b module depends circularly on commonJsStrict.


The only problem is that this is a breaking change that affects all code that uses KS-generated JavaScript parsers. I don't think there is a sane way around this - the existing approach of exporting directly the constructor function of the root type simply does not allow for circular dependencies (or out-of-order loading in browser globals context).

Compare the old and new usage:

Old

const HelloWorld = require('./HelloWorld');
// ...
const r = new HelloWorld(new KaitaiStream(buf));

New

const HelloWorld_ = require('./HelloWorld');
// ...
const r = new HelloWorld_.HelloWorld(new KaitaiStream(buf));
generalmimon commented 12 months ago

@GreyCat Any comments? Do you think this BC break is acceptable, and/or do you have any ideas to make it less "painful" for the users?

generalmimon commented 11 months ago

@GreyCat Thoughts?

generalmimon commented 8 months ago

This was implemented in https://github.com/kaitai-io/kaitai_struct_compiler/pull/264.

Basically the only disadvantage of this whole idea is the BC break, which is not mitigated in https://github.com/kaitai-io/kaitai_struct_compiler/pull/264 in any way. In theory, there are a few things we could do to lessen the impact (which doesn't necessarily mean we should; personally, I don't intend to do any of this unless someone expresses interest):