jestjs / jest

Delightful JavaScript Testing.
https://jestjs.io
MIT License
44.24k stars 6.46k forks source link

Feature: Supporting benchmarking and performance snapshotting #2694

Closed trueadm closed 7 years ago

trueadm commented 7 years ago

This is a proposal for a new feature that I feel would work very well with Jest and in general be of use by many in the community.

Currently, there's a lack of good performance tooling available – even more so, tooling that provides decent feedback when performance has regressed as a project grows.

This would require some new API for describing benchmarks and scenarios, but here is a very quick example of what I was envisaging:

benchmark('time it takes to render some UI feature', () => {
    scenario('some thing we do #1', () => {
        ...
    });

    scenario('some thing we do #2', () => {
        ...
    });

    scenario('some thing we do #3', () => {
        ...
    });
});

Using the function benchmark would define a block of scenarios to run, with each scenario be a specific alternative to benchmark. This could operate in a similar fashion to benchmark.js, where you get the operations/second and how many samples were executed (as used in jsperf.com and other tools).

Once the benchmark runs, Jest could store a snapshot of the results (possibly marking it against the CPU usage before the benchmark started, so it can account for background noise) and provide some feedback in terms of possible performance regressions. One possible thing to consider, is if these benchmarks should be run in parallel or not? It may skew the results (I'm not sure on this, I'd need to do some testing) depending on how system resources are being used.

Lastly, Jest could output a pretty summary at the end, for example:

Benchmark "time it takes to render some UI feature" results:
Scenario: "some thing we do #1":
512,841 ops/s±0.77%

Scenario:  "some thing we do #2":
186,701 ops/s±2.26%

Scenario:  "some thing we do #3":
448,138 ops/s±1.14%

Scenario "some thing we do #1" was fastest by 135%
Compared to last perf snapshot, this scenario has regressed by 7%

This is a very early proposal, but hopefully it can be refined and built upon into an actual feature. What do people think? What needs changing/altering?

bestander commented 7 years ago

I'd love to track Yarn performance regressions with that feature

kentaromiura commented 7 years ago

Snapshots are used to compare against a previous result with the expectation that it will not change, so while I like this proposal we cannot really use snapshots here as the result will most probably vary per run.

I think something like https://github.com/facebook/jest/issues/1705 might work here though.

trueadm commented 7 years ago

@kentaromiura I've done exploration in this before. The snapshot (is that even the right term here, maybe not?) would be more than just "the results for last time were X", more it would store the CPU/memory at the time of the last snapshot, along with dependency versions (including Node). These would be tied to the snapshot to give an accurate perception of regression when comparing results.

It may or may not work, but an investigation/spike into it would still be good.

acateland commented 7 years ago

@trueadm while i love to see this feature i see lots of pitfall performance could vary on different browser (an optimisation on chrome can be slower en FF). when working on team with different cpu/os could prevent having consistent result.

why not see this feature as refactor helper temporary hint ? i mean you run test with an option before starting any work, and each time you test you record the performance evolution, and before submitting your work checking if you did better or worse ?

bestander commented 7 years ago

I think something like this could perform consistently in CI

trueadm commented 7 years ago

@acateland Like I mentioned above, tests should only compare vs similar spec runs. If you ran a perf test, you'd expect it to be compared against the same test run on the same hardware with the same system system and resources.

lsjroberts commented 7 years ago

The ability to configure the regression percentage that classes as a failure would be useful, though that would be a step away from a strict snapshot approach.

trueadm commented 7 years ago

@lsjroberts I think having some configuration there would still be useful, I'm not sure if it should be done on a per benchmark basis (which mean would mean being able to pass config to a benchmark() function, or having it a straight global. I'd imagine some benchmarks might fluctuate depending on side-effects from other modules, I/O, network etc

andywer commented 7 years ago

Hi there. I had a similar idea and a conversation with a friend of mine recently. Benchmarking CPU time seems like a big pitfall (even though I'd ❤️ that), since it may vary widely, not only across machines, but even on the same machine (suppose a virus scanner or backup software is running in background right now).

Benchmarking memory consumption will probably be easier, using memwatch-next to monitor the heap size, for instance. Could also spawn a new process and watch it's total memory usage (might vary across operating systems, though).

JeffRMoore commented 7 years ago

I created a tool to do performance testing in the context of a CI suite that might be of interest, async-benchmark-runner. Its pretty rough at this stage especially from a DX perspective. I let it drift last year, because life. But this thread and another on ApolloData has inspired me to pick it up again. Happy to share learnings or code. Some sample output:

> analyze-benchmark

A Benchmark                                      Memory       Change
- ---------------------------------------- ------------ ------------
  11 benchmarks not different (p > 0.05)
A Benchmark                                       Time       Change
- ---------------------------------------- ----------- ------------
* Simplest object query: execute                430 ns     7% ±  3%
  Simplest list query: parse                    -28 ns    -2% ±  2%
* Simplest list query: execute, n=1             267 ns     5% ±  1%
* Simplest list query: execute, n=10            105 ns     2% ±  1%
* Simplest list query: execute, n=100           108 ns     2% ±  1%
* Simplest list query: execute, n=1000          158 ns     3% ±  1%
  5 benchmarks not different (p > 0.05)
acateland commented 7 years ago

@trueadm the it should be stated i bold font in this future feature documentation, that only CI test who are run on the same machine could count as reliable.

another way could maybe be check the perf delta not on only one test but on the whole test suite to know if there is a regression it is consistent across the whole suit as a way to determine if there is a difference due to hardware. Maybe before each test could be run another perf test to determine the computational capacity of a machine where test are run.

im not sure travis for example could be seen as stable performance platform, test execution time vary a lot between run.

cpojer commented 7 years ago

We aren't planning on doing anything in this space at this time.

schovi commented 6 years ago

@cpojer Any change after a year? We are just migrating to Jest and this would be truly awesome :)

jshearer commented 6 years ago

I'd love to add my $0.02 here: I "solved" this by doing a very dumb performance test, basically equivalent to this:

it('Should create 1000 objects pretty fast', async () => {

        var start = new Date()

        // Do expensive thing 1000 times

        var after_save_all = new Date()

        expect(after_save_all.getTime() - start.getTime()).toBeLessThanOrEqual(3000);
    })

which is kind of okay as long as my computer isn't doing anything else when I run the tests. It would be awesome to have benchmarking that takes into account the baseline CPU load!

In my case, I use the VSCode Jest extension, which runs all of my tests when VSCode opens, meaning that they run in parallel with all of the rest of VSCode's initializations, instead of a quiet CPU like normal. As a result, this test fails every time. Not a huge annoyance, but would absolutely be a useful feature.

federico-hv commented 5 years ago

@jshearer according to this article https://www.sitepoint.com/measuring-javascript-functions-performance the use of the Date object is not reliable for performance measurement and should be avoided. Instead performance.now() is recommended.

FishOrBear commented 5 years ago

I am not familiar with the jest code, but I think it should not be too difficult.

  1. Remove the parallel test.
  2. Add time calculation. (or memory calculation)

End.

AlexJeffcott commented 5 years ago

@jshearer according to this article https://www.sitepoint.com/measuring-javascript-functions-performance the use of the Date object is not reliable for performance measurement and should be avoided. Instead performance.now() is recommended.

window.performance is not available (I believe) from within Jest as you need a browser. If you are using Jest as your runner for Puppeteer, for example, you can do this (but way better obvs ;) ):

const setup = async deviceType => {
  const browser = await puppeteer.launch()
  const page = await browser.newPage()
  const client = await page.target().createCDPSession()
  await emulate(client, deviceType)
  await client.send(`Performance.enable`)
  await page.setViewport({
    width: 1920,
    height: 1080,
    deviceScaleFactor: 1,
  })
  return { browser, page }
}

const testPureFunction = async (toMeasure, args, repeatTimes, deviceType='FAST_DEVICE') => {
  const { browser, page } = await setup(deviceType)
  const { Timestamp: startStatus } = await page.metrics()
  let totalTaken = 0
  for (let i = 0;i < repeatTimes;i++) {
      const { Timestamp: startTimeSubtask } = await page.metrics()
      toMeasure(...args)
      const { Timestamp: endTimeSubtask } = await page.metrics()
      totalTaken += (endTimeSubtask -startTimeSubtask)
  }
  const { Timestamp: finalStatus } = await page.metrics()
  await browser.close()
  return {
    totalMilliseconds: (finalStatus - startStatus),
    averageMillisecondsPerTask: totalTaken / repeatTimes,
  }
}

describe(`decodeB64 function benchmark`, () => {
  it(`should run decodeB64 function 50 times and return benchmark`, async () => {
    const benchmark = await testPureFunction(decodeB64, `aGVsbG8gd29ybGQh`, 50)
    await expect(benchmark.totalMilliseconds).toBeLessThan(0.1)
    await expect(benchmark.averageMillisecondsPerTask).toBeLessThan(0.01)
  })
})
SimenB commented 5 years ago

In node you should use process.hrtime() (or process.hrtime.bigint() on node 10.7 and newer)

streamich commented 4 years ago

Maybe it is time to re-open this issue @SimenB ?

pckhoi commented 3 years ago

Hello guys, I have recently created jest-bench to solve this pain. Should be able to serve most needs.