jasmine / jasmine-npm

A jasmine runner for node projects.
MIT License
376 stars 145 forks source link

Consider using child_process.fork() to parallelize Jasmine runs #195

Closed ajvincent closed 2 years ago

ajvincent commented 2 years ago

Proof of concept: https://github.com/ajvincent/composite-collection/tree/main/build/jasmine-parallel

Each jasmine-npm runner runs as a child process, so they don't have shared memory. Using this code, I found my Jasmine specs ran 80% faster, cutting about 4 minutes from my npm run bootstrap process. I'm currently running over 1000 specs in two passes each, so this is a significant win.

I'm willing to do the work to add this. I know some work would have to go into spec/support.json to define specs that can run parallel to each other. I also know that sometimes there are potentially resources that must be executed before all parallel specs and after all parallel specs, and that some specs may still want to run sequentially after the parallel run.

sgravrock commented 2 years ago

I'd welcome PR that adds parallelization support.

That said: There have already been a couple of failed attempts at adding parallelization. I've learned two main lessons from those attempts. The first is that this area is fertile ground for bugs. The second is that a number of Jasmine's existing features interact with parallel execution in awkward ways. So I'm going to give any new PRs in this area some pretty close scrutiny. I don't get solid blocks of time to do focused work on Jasmine as often as I'd like, so unfortunately "close scrutiny" is another way of saying that review will probably be slow.

Once I get a bit more time to spend on this, I'll follow up with some design questions. You might also want to take a look at Mocha's documentation, which mentions a number of things that don't work in parallel mode. We don't have to make all the same decisions as Mocha, of course. But that's at least a good starting list of trouble spots.

ajvincent commented 2 years ago

Slow reviews will be fine, as I expect to design this slowly and carefully. Honestly, I want to do it right rather than rush this, and I have plenty to keep me busy. Writing the Jasmine specs for this will itself be very tricky.

ajvincent commented 2 years ago

I think the next step - critically, before writing any more code - is to determine a design for spec/support/jasmine.json and for the Configuration interface of jasmine-npm. Thus, everything below will be pseudo-code or API.

My thinking is, at least initially, that we define an optional "parallel" property. In TypeScript:

interface Configuration {
    spec_files: string[];
    // other properties as they already exist
};

interface TopConfiguration extends Configuration {
  parallel?: {
    sharedConfiguration?: Configuration;
    oneSpecPerRunner: boolean;

    // Exactly one of spec_configurations or spec_batches or spec_files
    spec_configurations: Configuration[];
    spec_batches: string[][];
    spec_files: string[];
  }
}

Here's how I envision this working:

  1. Pass spec_files through globFiles() to get an initial list of specifications.
  2. Pass each array of parallel.spec_batches through globFiles() to get a list of specification files the batches want to take away from spec_files.
    • This might be done with globFiles() returning a Set instead of an array.
    • Later converting to an ordered array is easy: const elements = Array.from(iterable); elements.sort();
  3. The topmost spec and helper files load. This installs global beforeAll(), beforeEach(), afterEach(), and afterAll().
  4. If topConfiguration.parallel is defined: a. Let sharedConfiguration be either topConfiguration.parallel.sharedConfiguration or topConfiguration b. Define a set of ChildJasmine objects, each of which takes a configuration which is:
    • derived from sharedConfiguration
    • overrides from a member of spec_configurations OR spec_files overrides from spec_batches c. Each ChildJasmine object also has a reference to an AggregateReporter, which will forward to the ConsoleReporter later.
  5. The global beforeAll() functions run.
  6. For each ChildJasmine object (via Promise.all): a. Parent beforeEach() and afterEach() callbacks are not supported. b. ChildJasmine calls fork("./child_process.js") and sends the child process a configuration to run.
  7. Inside each child process: a. There is a copy of Jasmine, which awaits a Promise for a child configuration. b. When the Promise resolves, we set up Jasmine with the child configuration and a special reporter which caches the events in order. c. await jasmine.execute(); d. reporter.replay(), which sends the ordered set of events (jasmineStarted, suiteStarted, specStarted, specDone, suiteDone, jasmineDone) to the parent process. e. await processExit.promise, which is waiting for a command from the parent process to exit.
  8. The parent process waits for the results from Promise.all(childJasmines). As each child reports: a. A one-character message to the console reporter indicates the reporting (successful or otherwise) from a child process. b. The parent process sends the exit command to the child process, and the child process exits with code 0.
  9. The parent process aggregates and totals up the totalSpecsDefined from each child, then adds in its serial totalSpecsDefined value, to call reporter.jasmineStarted() with.
  10. The parent process feeds the suites and specs from each child Jasmine process into the reporter, but not the jasmineDone() data.
    • This generates the visual effect of the specs running.
  11. The parent process runs its own serial tests, feeding to the reporter as it does now.
    • beforeEach(), afterEach(), afterAll() callbacks run at the parent level.
  12. The parent process aggregates the JasmineDoneInfo data from each child Jasmine process, plus its own serialized Jasmine, into a combined JasmineDoneInfo which it then feeds to the reporter.
  13. The parent process exits.
sgravrock commented 2 years ago

Here are some open design questions. I'm working with the following assumptions when I think about these questions:

  1. If Jest and Mocha do something the same way, it's probably the right choice. We should do likewise unless we have a good reason to think Jest and Mocha got it wrong, or Jasmine is different in some important way that leads us to a different decision.
  2. We should avoid breaking changes to the reporter interface if at all possible. In the past, we've lost some third-party reporters and their users when we've made breaking changes.
  3. We shouldn't do things that will prevent us from supporting parallelization in browsers later on. We don't need to actually build any support for browser parallelization now, but let's not paint ourselves into a corner.

Opt-in or opt-out?

Parallelization is opt-out in Jest and opt-in in Mocha, previous Jasmine attempts, and your PoC. I think opt-in is the right choice. Not all suites are faster in parallel, and it might be very hard to ship good on-by-default parallel support without some significant breaking changes.

Do worker processes load all spec files or just assigned ones?

The obvious choice is to assign each spec file to a particular worker, and have each worker load only the assigned spec files. It looks like that's what your PoC does. I'm pretty sure it's what Mocha and Jest do as well. I'm assuming that this is the right choice.

Another option is to have all worker processes load all files, and assign top-level runables (specs or suites) to each worker. That's more complicated and has at least one ugly failure mode (multiple top-level runables with the same name). However, a number of the options discussed below are only workable if every process loads all files.

How is the number of worker processses determined?

Jest and Mocha both allow this to be specified on the command line. It looks like the optimal number of workers varies considerably based on the hardware, the details of the test suite, and what other load the machine is under, but is generally not more than the number of CPU cores. It's pretty obvious to me that we need to give users the ability to use different numbers of workers for the same test suite in different situations.

I think we should allow the number of workers to be set in the config file and overridden on the command line. The default is 1, which disables parallelization. This is more or less what Mocha does. Jest defaults to choosing a number of workers based on the number of CPU cores but allows it to be overridden. I think this isn't the best choice: too many people report that their Jest suites run faster if they reduce the number of workers from the default. The growing popularity of CPUs that have a mix of performance and efficiency cores is going to make that kind of guesswork even more questionable in the future.

How are files/runables allocated to worker processes?

There are at least three options:

  1. If I understand your PoC correctly, the user manually groups files into buckets and each bucket is allocated to a worker process. I'm not aware of any existing systems that do that. The major drawbacks I see are that users would have to manually rebalance the buckets to maintain good performance, and it wouldn't be easy for users to change the number of workers.

  2. Split files/runables into N buckets at startup time and give each bucket to a worker. This would be easy to implement, and it would allow us to reproduce the same split every time the same random seed is implemented (more on that below). But it doesn't give optimal run time unless files/runables have roughly uniform run time.

  3. Each worker process works on one file/runable at a time. When it finishes, the parent gives it another one. Jest, Mocha, and previous Jasmine attempts all do this. It's a bit more complicated to implement but it should generally be self-balancing, giving a near-optimal distribution of files/runables to workers even when some suites are much slower than others.

To what extent will randomization still be reproducible?

Currently, users can reproduce an execution order by specifying a random seed. This is really useful for debugging intermittent failures that depend on order. Obviously there's no practical way to guarantee that two specs in different workers run in a particular order. But ideally we'd ensure that Jasmine distributes files/runables to workers the same way, and runs them in the same order within each worker, every time a random seed is specified.

That said, as mentioned above we probably have to choose between reproducible ordering and optimal distribution of files/runables to workers.

Jasmine is the odd one out here. Mocha doesn't allow users to reproduce a random seed, and Jest doesn't provide randomization at all. (I was very surprised to learn this. I thought both those features were table stakes for any serious testing tool.)

What about fit and fdescribe?

If each worker loads all spec files, it should be easy to keep fit and fdescribe working. If each worker loads only its assigned files, it's harder. (Probably impossible unless the parent process loads all files before running the workers.)

Ideally we'd make fit and fdescribe work properly in parallel mode. But I think it would be fine for an initial release to make them throw errors in parallel mode, like Mocha does.

Top-level (i.e. not in a describe) beforeEach/afterEach

Top-level beforeEach/afterEach in helper files are extremly common and they need to still work. I'm less clear on what should happen in spec files. My first thought was that top-level beforeEach/afterEach in a spec file should be an error when parallel mode is enabled. But jasmine-core doesn't know about the spec/helper file distinction or even the existence of files.

I'm not sure what Mocha does. Its documentation implies that top-level beforeEach/afterEach works the same in parallel mode as in sequential mode, but some third-party docs imply that it doesn't work. Jest avoids the problem by not supporting top-level beforeEach/afterEach at all.

Top-level beforeAll/afterAll

These currently run once per run of the entire test suite. If we keep doing that in parallel mode, we'll break suites that use beforeAll/afterAll to manage in-memory state. If we run them once per worker process, we'll break suites that use them to work with external resources like databases.

Jest avoids this problem by not supporting top-level beforeAll/afterAll. Mocha runs those functions once total in sequential mode and once per file in parallel mode. I think that's a mistake. It probably works a lot of the time but I'd expect it to be an unwelcome surprise for some users

Maybe the best thing is to throw an error if there's a top-level beforeAll or afterAll in parallel mode.

What about reporters?

Note: We need to think about users' custom reporters and third party reporters, not just the ones that come with Jasmine. CC: @UziTech, @bcaudan, and @dfederm.

Existing reporters might assume that reporter events for unrelated suites don't interleave, for example that this order doesn't happen:

  1. suiteStarted for suite1
  2. suiteStarted for suite2
  3. specStarted for suite1 spec1
  4. specStarted for suite2 spec1
  5. specDone for suite1 spec1
  6. suiteDone for suite1
  7. specDone for suite2 spec1
  8. suiteDone for suite2

That behavior isn't explicitly documented, but it's so obvious and long-established that deviating from it would have to be a semver-major change. I also think that it might be difficult or impossible to update some reporters to deal with interleaved specs/suites. Some of Mocha's built-in reporters don't work in parallel mode. That shouldn't be a problem for Jasmine's built-in reporters, but we would risk leaving third party reporters behind.

If I understand your PoC correctly, it solves this by batching up reporter events until the end and then emitting them in an order that's compatible with existing reporters. That solves the compatibility problem but introduces a couple of new ones. It breaks in environments like CI runners that expect to see output within a certain amount of time. And it makes it a lot harder to identify long-running specs with a reporter like this:

jasmine.addReporter({
    specStarted: event => {
        console.log(`${event.fullName} started`);
    },
    specDone: event => {
        console.log(`${event.fullName} finished in ${event.duration}ms`);
    }
});

That will still work, but you'd have to wait for the entire suite to finish instead of learning the slow spec's identity as soon as it runs.

I don't think batching everything up is workable, but I also don't think we can expect all existing reporters to work with interleaving. What about the following compromise? We allow reporters to declare that they support parallel mode. Jasmine would throw an error if run in parallel mode with a reporter that doesn't declare that support, and then reporter events could interleave.

Besides event ordering, the other thing we need to think about is Env#topSuite. This method returns metadata for the entire test suite. Will it still work in parallel mode? That's easy to do if every process loads all files but difficult or impossible if each worker only loads certain files. This also potentially affects non-reporter code. I wouldn't personally write a spec that looks at topSuite, but somebody else might. One option would be to make topSuite an error in parallel mode and delay releasing parallel support until the next major release. We might also consider having the parent process load all spec files so that topSuite can at least work there.

The stopOnSpecFailure config option

Is there a way to make this work in parallel mode? Is it worth the effort? Mocha tries to make it work, but doesn't guarantee anything. We might start by making that option an error in parallel mode, and add something like Mocha's behavior later if there's enough user interest.

I've probably forgot some things, but that should be plenty to think about for now.

ajvincent commented 2 years ago

That is awesome feedback. I think you care about this more than I do, which is a very good sign, because I care a lot.

Snippets from the above, with my responses.

One key point I should make is that up until now, I hadn't planned on altering jasmine-core at all. Your points about top-level beforeAll() , beforeEach(), afterEach() and afterAll() may force me to change my mind, but I don't want to do that if I can avoid it.

My proof of concept imports Jasmine as a ES6 module, so we might be able to signal to Jasmine that it should act as a child (disallowing parallel operation, maybe disabling these setup/teardown functions) via a Symbol:

Reflect.defineProperty(Jasmine, "DISABLE_PARALLEL", {
  value: Symbol("Disable parallel operations"),
  writable: false,
  enumerable: true,
  configurable: false
});

// ...

config[Jasmine.DISABLE_PARALLEL] = true;
jasmine.loadConfig(config);

Opt-in or opt-out?

Definitely opt-in. I consider opt-out to be a breaking change, especially at the point people upgrade from a forced-serial to forced-parallel Jasmine. The impact from that is absolutely unacceptable.

How is the number of worker processes determined?

I think we should allow the number of workers to be set in the config file and overridden on the command line. The default is 1, which disables parallelization. This is more or less what Mocha does. Jest defaults to choosing a number of workers based on the number of CPU cores but allows it to be overridden. I think this isn't the best choice: too many people report that their Jest suites run faster if they reduce the number of workers from the default. The growing popularity of CPUs that have a mix of performance and efficiency cores is going to make that kind of guesswork even more questionable in the future.

I honestly had not thought about that before, but now that you raise it, it's a very good point.

I'm strong enough with Promises to implement a limited Promise.all() capability - one that generates a queue of ChildJasmine objects, then starts x number of them at once and only fires off another when a presently running child completes. I've designed that sort of batching before, for a MacOS problem with too many entries into Promise.all(), so I just need to factor that into my design.

I would make this a command-line -j option, similar to Make's implementation, if that flag's not already reserved. Adding a "process_count" to the "parallel" dictionary in support.json shouldn't be too hard, and would probably solve a few other points you raise here.

How are files/runables allocated to worker processes?

1. If I understand your PoC correctly, the user manually groups files into buckets and each bucket is allocated to a worker process. 

Not exactly, although my design proposal for this allows this capability. I was planning on letting glob do the work for us, and then making some reasonable decisions from there.

If you want to reject that, that would be the spec_configurations and spec_batches options in my design proposal, and oneSpecPerRunner becomes redundant.

it wouldn't be easy for users to change the number of workers.

Unless I'm missing something, I've answered that in this response.

3. Each worker process works on one file/runable at a time. When it finishes, the parent gives it another one. Jest, Mocha, and previous Jasmine attempts all do this. It's a bit more complicated to implement but it should generally be self-balancing, giving a near-optimal distribution of files/runables to workers even when some suites are much slower than others.

This is what my proof-of-concept does, except that the subprocesses (what you're referring to as workers, but they mean different things in Node) exit automatically upon completion - not accepting new specs.

To what extent will randomization still be reproducible?

My current proposal doesn't factor in randomization across spec files. Since I talk about replacing Promise.all(), which guarantees no order of execution, just ordering of results, this is something to think about carefully in the design of subprocess batching.

What about fit and fdescribe?

I'm going to need help handling these cases.

Top-level (i.e. not in a describe) beforeEach/afterEach

Top-level beforeAll/afterAll

I really need to look at the implementation of these before I try to write tests or implementation..

Maybe the best thing is to throw an error if there's a top-level beforeAll or afterAll in parallel mode.

I suggest the Jasmine.DISABLE_PARALLEL symbol as a flag to check this if we need it. Personally, I think this is the safest option.

What about reporters?

We need to think about users' custom reporters and third party reporters, not just the ones that come with Jasmine.

That's a good point. This means we need to explicitly create - and document - some new reporter classes in jasmine-npm, instead of directly feeding ConsoleReporter as my PoC did. We'll need:

That behavior isn't explicitly documented

It's time to document this. :smile: We're talking about a few paragraphs at most, I presume.

If I understand your PoC correctly, it solves this by batching up reporter events until the end and then emitting them in an order that's compatible with existing reporters.

Yes.

It breaks in environments like CI runners that expect to see output within a certain amount of time.

Can you elaborate on this more, please?

And it makes it a lot harder to identify long-running specs with a reporter like this:

jasmine.addReporter({
  specStarted: event => {
      console.log(`${event.fullName} started`);
  },
  specDone: event => {
      console.log(`${event.fullName} finished in ${event.duration}ms`);
  }
});

I don't see how. My PoC only alters the jasmineStarted and jasmineDone events. Others replay exactly as they come in.

That will still work, but you'd have to wait for the entire suite to finish instead of learning the slow spec's identity as soon as it runs.

I think we can design around this, if we support two messaging reporters from the child process to the parent process: one for the batched report, and one for non-batched reporting. Sort of like stdout versus stderr. That non-batched reporter will not be reliable around jasmineStarted and jasmineDone, because again there's the spec counts passed through, so this would have to be documented carefully.

The types of messages between child and parent in my PoC are a small fixed set, and adding more types of messages should be pretty easy.

I don't think batching everything up is workable, but I also don't think we can expect all existing reporters to work with interleaving. What about the following compromise? We allow reporters to declare that they support parallel mode. Jasmine would throw an error if run in parallel mode with a reporter that doesn't declare that support, and then reporter events could interleave.

Conceptually, I'm okay with that, but it does mean a change to the Reporter interface of jasmine-core (argh, I really want to avoid changing core Jasmine in any way).

Besides event ordering, the other thing we need to think about is Env#topSuite. This method returns metadata for the entire test suite. Will it still work in parallel mode? That's easy to do if every process loads all files but difficult or impossible if each worker only loads certain files. This also potentially affects non-reporter code. I wouldn't personally write a spec that looks at topSuite, but somebody else might. One option would be to make topSuite an error in parallel mode and delay releasing parallel support until the next major release. We might also consider having the parent process load all spec files so that topSuite can at least work there.

I don't suppose we can just deprecate topSuite... I know absolutely nothing about this, so I need help to figure this angle out. I think this is something we can delay figuring out until later in the implementation, but please call out if you disagree.

Can we clean up the generated page for Suite? It has children listed four times, id four times, etc.

The stopOnSpecFailure config option

Is there a way to make this work in parallel mode?

I think so. Again, we could implement this as a priority message from child process to parent, but without ordering of spec files, I can't guarantee which spec it will stop for. More documentation.

Is it worth the effort? Mocha tries to make it work, but doesn't guarantee anything. We might start by making that option an error in parallel mode, and add something like Mocha's behavior later if there's enough user interest.

Following my proposal, that would mean moving stopOnSpecFailure from the existing Configuration interface of jasmine-npm to my suggested TopConfiguration interface. That should be backwards-compatible, I think.

UziTech commented 2 years ago

I don't suppose we can just deprecate topSuite... I know absolutely nothing about this, so I need help to figure this angle out. I think this is something we can delay figuring out until later in the implementation, but please call out if you disagree.

I disagree. I think there is a lot of value in reporters knowing which tests exist before they run.

image

I like the approach of having some parallelizable reporters and some not. There will have to be tradeoffs in the reporter and I don't think every reporter should be forced to support both ways.

ajvincent commented 2 years ago

Okay, based on @Uzitech's objection, I need to rethink my approach a bit.

jasmine-npm: developing around six phases

I was initially thinking of a straight implementation and testing, but there's perhaps six different phases to implement:

  1. Loading all the tests, before they run, to gather spec metadata
  2. Top-level beforeAll() in the non-parallelized spec files
  3. Running the parallelized specs and reporting live results through a non-delayed reporter
  4. Reporting results through the delayed reporter
  5. Running non-parallelized specs and reporting those
  6. Top-level afterAll() in the non-parallelized spec files

To ensure compatibility, I think this means I'll have to write A/B tests, where one Jasmine under test is serialized and one is parallelized - and we're explicitly checking a few properties to make sure they match:

When is topSuite ready for access?

The documentation for Env.prototype.topSuite reveals a potential problem. Unlike JasmineStartedInfo or JasmineDoneInfo, which are event-driven, topSuite() as a function returns a Suite instance directly. If it returned a Promise<Suite>, we'd have no problem: I could populate the suite asynchronously.

Idle processes

To fully populate topSuite() before any specs run, we have to load all the specs. I see several options, not mutually exclusive.

I need to hear what people think on this. I personally prefer the first option. The third option complicates the timing of releases, but might be the right thing to do.

Changes for jasmine-core

Reporter.prototype.batched

Over the weekend, I was thinking on this quite a bit, and I realized I don't understand exactly what we mean by "parallelizable reporter". More precisely, what a reporter setting that flag expects. "Batched" and "unbatched" are pretty clear names, I think, or maybe "buffered" / "unbuffered".

So I propose, for core, we add a boolean property (defaulting to false) called batched. When jasmine.addReporter(reporter) executes, it will look at the batched setting to decide whether to add it to a replay reporter (which by definition is not batched for the environment) or directly to the environment.

Though I don't foresee any other batching modes than "at end of job", if there is a reasonable possibility of that, we should define batched to be a string instead of a boolean. I mean, I can imagine a "batched by suite" option, but it's pretty easy to write a forwarding reporter following that model.

Documenting reporter traps supporting batched or unbatched operations

There are six traps for each reporter. In unbatched parallel mode, I would say we can't trust jasmineDone() or jasmineStarted() to be accurate, since they'll be called several times. A batching reporter won't be better, but the parent aggregation reporter in jasmine-npm (which is batched) will clean that up as I've already demonstrated, so we can have batched reporters attach to the aggregation reporter instead.

Stub reporter class

One of the facets which caught me by surprise in my proof-of-concept implementation was that some reporters didn't implement all the methods of the Reporter interface. I think Jasmine core should provide a stub reporter base class where the methods are no-ops. I'm not locked in on this position, though.

Safe reporter class

Alternatively, I could provide a safe reporter class which checks for the existence of each method of a target reporter before we call it. I'm going to need that anyway, but the real question is should I expose that class publicly.

Spec and Suite id properties

Digging in a little further, I now see another stumbling block: Suite.prototype.id and Spec.prototype.id might not be unique in a child-process situation as currently written. getNextSpecId() and getNextSuiteId() in Env.js (or their accompanying private variables) need to be altered in the parallelizing world. Jasmine's documentation makes no mention of the specific shape of the id beyond "it's a string", so I must ask how much freedom I have to change that too.

(This is definitely a bug in my proof-of-concept code.)

Testing uniqueness of id's should be easy: add each id to a Set and compare the size of the Set to the number of specs.

topSuite()

See the above.

Versioning strategy

By themselves, batched reporting and id strings are additions and prerequisites for parallel reporting. I think we can add them as minor changes to the core, particularly since they don't themselves require other features.

That said, I'd very much like to get them baking in a minor version release before the parallel running code of jasmine-npm.

As I mention above, changing the API for topSuite would be a breaking change, requiring a major release anyway.

To keep momentum going, I'll need some decisions on the jasmine-core strategy soon. I expect not to make comments or start writing code until next weekend, to give others time to respond to what everyone's said so far, but I want to get jasmine-core designs approved and changes into the pipeline first.

sgravrock commented 2 years ago

I'll try to find time to respond to the various comments in the next couple of days.

FWIW, I don't think we've come to a consensus on a number of key high level design questions. It's probably going to take a bit more back-and-forth to get there. Of course you're free to start coding, but that's likely to create a situation where you have to choose between doing significant rework and having your PR declined. Either of those outcomes would make me sad. But so would shipping the wrong behavior and being stuck with it for years, or even forever.

I appreciate your enthusiasm and your concern about momentum. But we're talking about a major change to a piece of software that's 13 years old, gets downloaded millions of times a week, and is maintained by volunteers in their free time. Things are just not going to happen as fast as you want them to.

Good catch on the ID collision. I hadn't thought of that.

ajvincent commented 2 years ago

FWIW, I don't think we've come to a consensus on a number of key high level design questions. It's probably going to take a bit more back-and-forth to get there. Of course you're free to start coding, but that's likely to create a situation where you have to choose between doing significant rework and having your PR declined. Either of those outcomes would make me sad. But so would shipping the wrong behavior and being stuck with it for years, or even forever.

I appreciate your enthusiasm and your concern about momentum. But we're talking about a major change to a piece of software that's 13 years old, gets downloaded millions of times a week, and is maintained by volunteers in their free time. Things are just not going to happen as fast as you want them to.

You're right. I guess I got a sense that this was moving faster than it really was. For what it's worth, I haven't written one line of code and I don't plan on doing so until I get some actual design approvals.

UziTech commented 2 years ago

Here's my 2¢.

Opt-in or opt-out?

Opt-in, at least initially until we know everything works well. Maybe in a future release it could be changed to opt-out.

Do worker processes load all spec files or just assigned ones?

Load assigned ones.

How is the number of worker processes determined?

Default to num cpus / 2 but configurable.

How are files/runables allocated to worker processes?

workers grab files when they are idle.

To what extent will randomization still be reproducible?

Randomization for tests in a single file. seed determines order. Possibly randomize file order as well. File order won't be reproducible if different workers can grab them.

What about fit and fdescribe?

All tests loaded initially, then determine which files to load based on fit and fdescribe.

or

All tests loaded initially, then run without parallelization if fit or fdescribe exist.

Top-level (i.e. not in a describe) beforeEach/afterEach

Throw error. Some things just don't make sense in parallelized tests. There shouldn't be any reason they can't create a beforeEach/afterEach in each describe and import a common function.

Top-level beforeAll/afterAll

Throw error in spec files. Allow in helpers to be run on the main thread before/after all tests.

What about reporters?

Batch calls after each file is complete.

I would rather the workers report back at the start and end of every test but I feel like that would probably be a lot slower.

The stopOnSpecFailure config option

Send stop signal to all workers. Workers finish current test (including afterEach/afterAll) then stop.


Pseudocode:

// only evaluate files with focused tests if any exist
topSuite = loadTests()
files = filesList()
if (hasFocusedTests()) {
  files = onlyFilesThatHaveFocusedTests()
}

if (config.numWorkers <= 1) {
  return runTests()
}

workers = startWorkers()
for (const worker of workers) {
  // stop all workers on failure if stopOnSpecFailure
  worker.onMessage('failure', () => {
    if (config.stopOnSpecFailure) {
      workers.forEach(w => w.sendMessage('stop'))
    }
  });

  // when worker is done the main thread sends the batched calls to
  // the reporter and send it the next file to evaluate
  worker.onMessage('done', (metadata) => {
    sendNextFileTo(worker)
    sendCallsToReporter(metadata)
  });
}
sgravrock commented 2 years ago

Rearranging things a bit:

Reporters, topSuite, and spec/suite IDs

These are all interrelated, and what we choose here could have major implications for the rest of the system.

Spec/suite IDs allow reporters to link reporter events with spec/suite objects accessed through topSuite. For that to work, the IDs don't just have to be unique. They also have to be consistent. In other words, a reporter event originating in a child process must have the same ID as the corresponding spec/suite object in the parent process. I don't see a reasonable way to do that without making every process load all files, which I don't think is a good idea.

On the other hand, if topSuite isn't supported in parallel mode, reporters would only need the IDs to be unique. topSuite is useful but pretty niche. So while we can't remove it, I think it would be OK for it to work only in non-parallel mode.

I don't want to do anything that would break existing reporters in non-parallel mode. That includes making reporter methods required, requiring reporters to inherit from a base class, or assuming that reporters can handle looser event ordering guarantees. But I also don't think we need to go out of our way to make existing reporters work in parallel mode without changes.

I think it'll be OK to emit reporter events as they happen rather than batching them, with the exception of jasmineStarted and jasmineDone. I hear @UziTech's concern about performance but I suspect it won't matter. I'd rather not take on the extra complexity unless we can measure a noticeable speed difference. Batching also has the potential to introduce hard-to-reproduce bugs in some cases, like if the last file to run takes longer to run than the CI system's no-output timeout. Avoiding hard-to-reproduce crash bugs is an extremely high priority for me as a maintainer.

So here's what I'm thinking:

A note on implementation: We shouldn't write new code to deal with reporters that don't implement all methods. ReportDispatcher already solves that problem. It also encapsulates a decade's worth of hard-learned lessons on how to safely call through buggy user-provided async code. Yes, it's a -core internal, but I'd rather use it from -npm than reinvent that particular wheel.

Top level beforeAll/afterAll

These should throw exceptions in parallel mode. I don't see any reasonable alternative. If we run them in each process, suites that use them to set up external state (e.g. database) will break. If we only run them in the parent process, suites that use them to set up in-memory state will break.

Top level beforeEach/afterEach

Top-level beforeEach/afterEach in helper files has to work in parallel mode. It's an extremely common and useful pattern, and omitting it will prevent too many people from using parallel mode. Jasmine's own test suite relies on it. Not being able to run Jasmine's tests in parallel mode would mke it much harder for me to test the feature.

On the other hand, I'm fine with breaking top-level beforeEach/afterEach in spec files when run in parallel mode. And I don't see an alternative. It would be nice to report an error in that case, but I'd also be ok with just documenting that it doesn't work reliably.

fit/fdescribe

These can throw in parallel mode. They're mainly for debugging, and I don't think anyone really needs to debug and run in parallel mode at the same time. I'm not opposed to making them work in parallel mode but we shouldn't add any significant complexity to make that happen.

Reproducible randomization

@UziTech:

Randomization for tests in a single file. seed determines order. Possibly randomize file order as well. File order won't be reproducible if different workers can grab them.

Agreed. That's probably the best we can do.

stopSpecOnFailure

It sounds like we're converging on best-effort support for that option, like Mocha does. I'm fine with that. I'd also be fine with making it an error in parallel mode, and adding support later if there's enough user interest.

Configuration

Mocha and Jest have no parallel-specific configuration other than the number of process. I think they got this right and we should follow their lead. I think that's going to be better for most users than the PoC's more complex configuration.

Overall execution model

Mocha and Jest create a fixed number of processes and allocate files to them. @ajvincent's PoC creates one process per file. I think Mocha and Jest have this right and we should follow their lead.

There are two problems with the process-per-file model. The first is memory consumption. If many spec files load big module graphs (e.g. Material UI) there may not be enough memory to have that many copies. The second is time overhead. On my Windows machine, the median top-level suite in jasmine-core takes 47ms. But it costs ~70ms just to fork and exit a Node process that doesn't load any code. These aren't just edge cases. In my experience, many real-world codebases have both a bunch of heavyweight suites and a long tail of fast suites. We need to handle both well.

It sounds like we're all agreed on the idea of distributing files, not suites/specs, to child processes.

@UziTech:

workers grab files when they are idle.

That's what I'm leaning toward as well. It seems like it would provide the best performance. It probably adds some state management complexity to Env though. When each new file arrives, the child process's Env would need to make sure that it runs everything defined in that file, along with beforeEach/afterEach fns defined in helpers, but not run any specs or suites defined in previous spec files. I don't think that would be too hard but I haven't thought about it very deeply yet.

I'm neutral-ish on whether the parent process should load spec files. Does that buy us any capabilities that don't also require us to have consistent IDs across processes?

sgravrock commented 2 years ago

As for handling core changes: I'm really reluctant to merge parallel support code into core before the feature as a whole is ready. In my experience code that isn't used yet tends to be hard to review, hard to maintain, buggy, and often needs a redesign anyway when its eventual caller shows up. If you have specific refactorings that will make things better even if we never ship parallel support, I'm happy to consider those on a case by case basis. If you want to have core changes reviewed piecmeal and landed on a branch, I can also do that. But I'll also give the whole thing a review when it's ready to go and working.

ajvincent commented 2 years ago

A lot to digest there, and I'll need to read through it a few times before I compose a response.

Regarding the consistent ID problem, I think we just need a hash function, say:

  1. Convert the absolute file path from Unicode to UTF-8
  2. Encode the result as base64
  3. Append :suite(${suiteCounter++}) or :spec(${specCounter++}) (where suiteCounter or specCounter are stored in a Map<filePath, counters>)

The colon character is not valid base64 encoding, so it'll work as a delimiter if we need to reverse it.

sgravrock commented 2 years ago

The part of Jasmine that generates IDs doesn't know anything about files, and can't even assume that suites are defined in files. A lot of Node-centric assumptions break down as soon as you cross into -core.

ajvincent commented 2 years ago

... okay, what if we passed in an id prefix from the loader into the core?

UziTech commented 2 years ago

Top level beforeAll/afterAll

These should throw exceptions in parallel mode. I don't see any reasonable alternative. If we run them in each process, suites that use them to set up external state (e.g. database) will break. If we only run them in the parent process, suites that use them to set up in-memory state will break.

I think we need a way to run something before and after all tests (e.g. to setup/teardown a database). The once per process can be done inside a describe.

sgravrock commented 2 years ago

I think we need a way to run something before and after all tests (e.g. to setup/teardown a database).

That's probably the motivation behind Mocha's global fixtures. I'm not sure how closely I'd want to copy that design, but I think the idea of having a clearly separate construct for exactly-once setup and teardown has merit.

UziTech commented 2 years ago

A lot of Node-centric assumptions break down as soon as you cross into core.

If we need to keep core environment agnostic, will we be able to launch the workers from core? Or will we need to run core in workers? It seems like jest and mocha in parallel mode assume a nodejs environment.

sgravrock commented 2 years ago

If we need to keep core environment agnostic, will we be able to launch the workers from core? Or will we need to run core in workers? It seems like jest and mocha in parallel mode assume a nodejs environment.

Core definitely needs to run in each worker, since everything that spec files expect is implemented in core. I could see the coordinator code in the parent process being done in either of two different ways:

The first option is what the PoC does, and it's the first thing I'd try if I was doing this myself.

ajvincent commented 2 years ago

I've been tied up the last couple weeks, but this has been on my mind.

Top-level beforeAll() and afterAll()

I plan on preserving that as a non-parallel option only. My intent was to load the non-parallel spec files first (which would trigger beforeAll()), then fire off the parallel runners, and only when they're all finished, run the non-parallel specs (which would eventually trigger afterAll()). Of course, this assumes incorrectly that I won't have to tweak the run algorithms, but the intent is the same.

Subprocesses staying alive and handling multiple spec files

Keeping a specific number of parallel subprocesses alive as runners, I think, means another important change to Jasmine core: incremental spec loading inside a subprocess. That is:

  1. The parent process (running jasmine-npm) sends in a spec file location to the child process (running jasmine-npm), which may have already run Jasmine tests.
  2. The child process must clear its cache of any specs, suites, and that of any reporters it has.
    • The latter is probably best done via recreating the child reporter, but I don't know.
    • How hard is it for jasmine-npm to completely destroy any vestiges of jasmine-core state? Are there shared services? Are we going to run into a shared state problem which may have killed the last attempt to do parallelization?
  3. The child process generates an unique id prefix for suites and specs it's about to load.
  4. The child process creates an instance of jasmine-core.
    • Is this where worker threads come into play, where each thread can load Jasmine as an independent copy?
  5. The child process loads the spec file into the same thread as jasmine-core, passing in the unique id prefix.
  6. jasmine-core runs the specs, and reports back to jasmine-npm's child process.
  7. The child process forwards the reports to the parent process, and then declares itself ready to accept another job.

This is my first impression on how I'd do this.

Are we approaching consensus on a design?

I think so, but I'm not sure if we're resolving these questions. @sgravrock, your thoughts?

sgravrock commented 2 years ago

@ajvincent I just realized that you've been talking about non-parallel spec files in your last few messages, and I didn't pick up on it. What is a non-parallel spec file? What use case do they serve? Do we need them, given that Mocha and Jest have been successful with an all-or-nothing parallelization scheme?

My intent was to load the non-parallel spec files first (which would trigger beforeAll()), then fire off the parallel runners, and only when they're all finished, run the non-parallel specs (which would eventually trigger afterAll()).

I think allowing any form of top level beforeAll/afterAll in parallel mode is a mistake. Either choice (running them once in the parent process or running them in each child) is going to be confusingly wrong for a lot of users. We might be better off with something like Mocha's design where there is a completely separate way to specify exactly-once operations in parallel mode. But this isn't something we need to solve right now: I could work on it once we have working code in place for the rest of parallel mode. @UziTech has persuaded me that we need some sort of exactly-once setup and teardown, but we might actually be better off figuring out the details later.

The latter is probably best done via recreating the child reporter, but I don't know.

I don't think we need to do that. The report dispatcher (somewhat confusingly called reporter in some places) is already stateless except for the list of reporters it dispatches to. My expectation is that in parallel mode, the only configured reporter on the child will the one that forwards events to the parent. That can probably be stateless too, and if not it should be a simple matter to reset it.

User-provided reporters are normally configured through Jasmine#addReporter, which will have the effect of attaching them only to the parent's Env. So we shouldn't have to worry about them in the child.

How hard is it for jasmine-npm to completely destroy any vestiges of jasmine-core state?

Easy (just create a new Env), but we can't do it because it would break top-level beforeEach and afterEach.

Are there shared services? Are we going to run into a shared state problem which may have killed the last attempt to do parallelization?

There were a number of problems that killed the last attempt. You're right that shared state was one of them. I think/hope that #1934 already takes care of most of that. I wouldn't be surprised if it's also necessary to remove children of the top suite in between batches. We might also need to do something to either preserve the random seed between batches or reset it, depending on which seems like the better behavior.

I've done a bunch of refactoring recently to split Env into smaller classes. I hope that makes the state management somewhat easier to follow than it was before.

Is this where worker threads come into play, where each thread can load Jasmine as an independent copy?

I don't see why we need worker threads when we're already creating processes.

Are we approaching consensus on a design?

I think so, but I'm not sure if we're resolving these questions. @sgravrock, your thoughts?

I hope so. Here's what I'm thinking about the key decisions:

ajvincent commented 2 years ago

Combining the next two quotes together:

What is a non-parallel spec file? What use case do they serve?

...

@UziTech has persuaded me that we need some sort of exactly-once setup and teardown

This is one of two primary use cases for non-parallel spec files. The other is for specs the user - for whatever reason - didn't want to run in parallel mode. I will refer you again to https://github.com/jasmine/jasmine-npm/issues/195#issuecomment-1142560241 , specifically the very first section listing a six-step algorithm.

I think allowing any form of top level beforeAll/afterAll in parallel mode is a mistake.

I have agreed all along. Emphasis in parallel mode. If a particular spec file in spec/support.json is not brought into a parallel configuration as an opt-in, it stays non-parallel, and we allow beforeAll() / afterAll().

This is why another of my comments, https://github.com/jasmine/jasmine-npm/issues/195#issuecomment-1135374456 , referred to TopConfiguration as separate (and extending) from Configuration.

How hard is it for jasmine-npm to completely destroy any vestiges of jasmine-core state?

Easy (just create a new Env), but we can't do it because it would break top-level beforeEach and afterEach.

If that's what happens, and we want to have no more than -j child processes running, that might be exactly what we need. A child process, under the latest discussion, takes exactly one spec file, loads it, runs it, and then clears the environment for the next spec file to load.

You've stated previously:

On the other hand, I'm fine with breaking top-level beforeEach/afterEach in spec files when run in parallel mode.

I hope so. Here's what I'm thinking about the key decisions:

  • Top-level beforeEach/afterEach: Has to work in parallel mode.

I think I've missed something critical here. There's top-level in a spec file loaded in parallel mode (meaning "files loaded as parallelized spec files") and there's top-level in a spec file loaded in non-parallel spec files (meaning "files loaded the current way, at the top level"). Both of these have (as potentially illegal - I'll get to that in a moment) beforeAll(), beforeEach(), afterEach() and afterAll() as top-level functions.

We probably need to lay this out in a Markdown table to illustrate:

Top-level Function In non-parallel spec file In parallel spec file
beforeAll() Global setup Throws exception
beforeEach(), afterEach() Applies to non-parallel spec files Applies to only that spec file
Unclear if they apply to parallel files
If they do, how do we apply them?
afterAll() Global teardown Throws exception

As I've said before, I envisioned non-parallel spec files as your global setup / teardown mechanism.

I think top-level beforeEach() and afterEach() in a parallel-loaded spec file should only apply to that spec file. The only reason this wouldn't be a breaking change is that we haven't had parallel specs before.

Please feel free to copy and modify the above table to clarify your thinking.

ajvincent commented 2 years ago

I think, if we really want to share beforeEach() and afterEach() definitions across parallel spec files, or from non-parallel spec files to parallel ones, we would want to encourage users to leverage CommonJS's require() or ECMAScript's import statements to do so. Suggest they break out the shared beforeEach() / afterEach() functions into files they can import.

UziTech commented 2 years ago

given that Mocha and Jest have been successful with an all-or-nothing parallelization scheme?

I agree we should do all or nothing. If someone wants some tests to be parallel and some not they can just run jasmine twice one time in parallel mode and one time without. Mixing them would be very difficult and result in some things working differently than they would otherwise.

if we really want to share beforeEach() and afterEach() definitions across parallel spec files

I really don't think top-level beforeEach() and afterEach() in spec files in parallel mode is feasible. We would have to load every file in every process just to check if there are any top-level functions. I think top-level functions should only be allowed in helper files in parallel mode. We can load every helper file in every process.

Overall execution model: Jasmine creates the specified number of child processes and allocates files to them. Helper files are loaded in every child process. Each spec file is only loaded in the child process that will run it.

Is this jasmine-npm? I think we really need to differentiate jasmine-npm with jasmine-core. I think there is some confusion when saying "jasmine".

This means running jasmine in the browser would not allow parallel mode?

ajvincent commented 2 years ago

Latest responses

I agree we should do all or nothing. If someone wants some tests to be parallel and some not they can just run jasmine twice one time in parallel mode and one time without. Mixing them would be very difficult and result in some things working differently than they would otherwise.

(Update: This is a really good point - running jasmine-npm twice - one I hadn't considered fully. I'll buy into that. If someone really wants it to appear like a combined session, they can easily write a bootstrapping reporter like I did in my PoC.)

How would we do global setup and teardown then? That's the part I don't have an answer for, except by some other change which states in the top configuration "these files jasmine-npm must execute before any parallel specs run".

(Note: I'm not wedded to the idea of a "top configuration" versus a "child configuration". That was an early thought experiment which I'm willing to ditch for consensus.)

I'm not opposed, exactly, but we have to support global setup / teardown somehow. Allowing some files to be treated as non-parallel spec files seemed like the simplest option.

Please, do propose a schema change for a parallel-only jasmine-npm configuration which would achieve this. I'm open to suggestions.

if we really want to share beforeEach() and afterEach() definitions across parallel spec files

I really don't think top-level beforeEach() and afterEach() in spec files in parallel mode is feasible. We would have to load every file in every process just to check if there are any top-level functions. I think top-level functions should only be allowed in helper files in parallel mode. We can load every helper file in every process.

Honestly, I'm okay with this for forward compatibility: throwing exceptions in an earlier version for behavior that a later version may choose to support is a safe way to go. We can rule it out for now and allow some future implementation to implement it later.

This would change my above table:

Top-level Function In non-parallel spec file In parallel spec file
beforeAll() Global setup Throws exception
beforeEach(), afterEach() Applies to non-parallel spec files Throws exception
No impact on parallel spec files
afterAll() Global teardown Throws exception

This means running jasmine in the browser would not allow parallel mode?

Right now I see jasmine-browser-runner as out of scope for this feature. Again, we can rough this out here and figure out how to add support there later.

Let's start defining schemas, please.

Given @sgravrock 's answer that we are approaching consensus (and I agree - we're fiddling over top-level, which is an important detail but the only one of contention), I think it's about time to formally define the schemas we intend jasmine-npm to use, both for spec/support.json and for the internal configuration object, to support parallelization. Consider my initial TopConfiguration and Configuration definitions to be throw-away code.

Question number 1 for the jasmine-npm schema is "How do we tell Jasmine it should run some (or all) spec files in parallel mode?"

Again, I'm not writing code yet.

ajvincent commented 2 years ago

Side note for discussion: we're going to have at least three different reporters in jasmine-npm/lib/reporters: the existing console reporter, an aggregator reporter and a child process reporter.

I'm wondering if there are other reporter classes "in the wild" we might want to add to jasmine-npm as part of the package? (Since we're targeting 5.0 anyway.) I'd like to see a separate issue for nominations.

sgravrock commented 2 years ago

Given @sgravrock 's answer that we are approaching consensus (and I agree - we're fiddling over top-level, which is an important detail but the only one of contention) [...]

I said no such thing. What I said was "I hope so". I then outlined my current thinking. You responded by disagreeing on at least two key points, including pushing for a breaking change that I'd previously said was out of the question. I don't understand how you get from there to claiming, with italics even, that I'd said we were approaching consensus.

We've been talking past each other quite a bit in this thread. We have different goals, priorities, and ways of approaching software design. You've been slow to accept feedback that doesn't fit with your plans and too quick to propose breaking existing user code. I'm willing to own my share of the communication gap, and I was willing to work through all of that to help you make your contribution.

But I draw the line at obvious false claims about what I said. I don't know if you're trying to manipulate me or if you truly heard "yes" when I said "I hope so". And I don't really care. Either of those is fatal to our collaboration.

I'm not going to work with you any more. I'm sorry. I really hoped this would work out, but it didn't.

I thank you for the earlier part of this discussion. It's done a lot to firm up my understanding of how parallel support should work from the user's perspective. It's also taught me that adding parallel support will take even deeper knowledge of even more of Jasmine than I originally thought. I don't think this is a good feature for most contributors to take on. I also don't think I have the time or energy to guide someone through it at a reasonable pace. My plan is to try to build it myself sometime between now and 5.0.

sgravrock commented 1 year ago

Parallel support is now available in 5.0.0-alpha.0. See https://jasmine.github.io/tutorials/running_specs_in_parallel for details.