Closed trueadm closed 7 years ago
I'd love to track Yarn performance regressions with that feature
Snapshots are used to compare against a previous result with the expectation that it will not change, so while I like this proposal we cannot really use snapshots here as the result will most probably vary per run.
I think something like https://github.com/facebook/jest/issues/1705 might work here though.
@kentaromiura I've done exploration in this before. The snapshot (is that even the right term here, maybe not?) would be more than just "the results for last time were X", more it would store the CPU/memory at the time of the last snapshot, along with dependency versions (including Node). These would be tied to the snapshot to give an accurate perception of regression when comparing results.
It may or may not work, but an investigation/spike into it would still be good.
@trueadm while i love to see this feature i see lots of pitfall performance could vary on different browser (an optimisation on chrome can be slower en FF). when working on team with different cpu/os could prevent having consistent result.
why not see this feature as refactor helper temporary hint ? i mean you run test with an option before starting any work, and each time you test you record the performance evolution, and before submitting your work checking if you did better or worse ?
I think something like this could perform consistently in CI
@acateland Like I mentioned above, tests should only compare vs similar spec runs. If you ran a perf test, you'd expect it to be compared against the same test run on the same hardware with the same system system and resources.
The ability to configure the regression percentage that classes as a failure would be useful, though that would be a step away from a strict snapshot approach.
@lsjroberts I think having some configuration there would still be useful, I'm not sure if it should be done on a per benchmark basis (which mean would mean being able to pass config to a benchmark()
function, or having it a straight global. I'd imagine some benchmarks might fluctuate depending on side-effects from other modules, I/O, network etc
Hi there. I had a similar idea and a conversation with a friend of mine recently. Benchmarking CPU time seems like a big pitfall (even though I'd ❤️ that), since it may vary widely, not only across machines, but even on the same machine (suppose a virus scanner or backup software is running in background right now).
Benchmarking memory consumption will probably be easier, using memwatch-next to monitor the heap size, for instance. Could also spawn a new process and watch it's total memory usage (might vary across operating systems, though).
I created a tool to do performance testing in the context of a CI suite that might be of interest, async-benchmark-runner. Its pretty rough at this stage especially from a DX perspective. I let it drift last year, because life. But this thread and another on ApolloData has inspired me to pick it up again. Happy to share learnings or code. Some sample output:
> analyze-benchmark
A Benchmark Memory Change
- ---------------------------------------- ------------ ------------
11 benchmarks not different (p > 0.05)
A Benchmark Time Change
- ---------------------------------------- ----------- ------------
* Simplest object query: execute 430 ns 7% ± 3%
Simplest list query: parse -28 ns -2% ± 2%
* Simplest list query: execute, n=1 267 ns 5% ± 1%
* Simplest list query: execute, n=10 105 ns 2% ± 1%
* Simplest list query: execute, n=100 108 ns 2% ± 1%
* Simplest list query: execute, n=1000 158 ns 3% ± 1%
5 benchmarks not different (p > 0.05)
@trueadm the it should be stated i bold font in this future feature documentation, that only CI test who are run on the same machine could count as reliable.
another way could maybe be check the perf delta not on only one test but on the whole test suite to know if there is a regression it is consistent across the whole suit as a way to determine if there is a difference due to hardware. Maybe before each test could be run another perf test to determine the computational capacity of a machine where test are run.
im not sure travis for example could be seen as stable performance platform, test execution time vary a lot between run.
We aren't planning on doing anything in this space at this time.
@cpojer Any change after a year? We are just migrating to Jest and this would be truly awesome :)
I'd love to add my $0.02 here: I "solved" this by doing a very dumb performance test, basically equivalent to this:
it('Should create 1000 objects pretty fast', async () => {
var start = new Date()
// Do expensive thing 1000 times
var after_save_all = new Date()
expect(after_save_all.getTime() - start.getTime()).toBeLessThanOrEqual(3000);
})
which is kind of okay as long as my computer isn't doing anything else when I run the tests. It would be awesome to have benchmarking that takes into account the baseline CPU load!
In my case, I use the VSCode Jest extension, which runs all of my tests when VSCode opens, meaning that they run in parallel with all of the rest of VSCode's initializations, instead of a quiet CPU like normal. As a result, this test fails every time. Not a huge annoyance, but would absolutely be a useful feature.
@jshearer according to this article https://www.sitepoint.com/measuring-javascript-functions-performance the use of the Date
object is not reliable for performance measurement and should be avoided. Instead performance.now()
is recommended.
I am not familiar with the jest code, but I think it should not be too difficult.
End.
@jshearer according to this article https://www.sitepoint.com/measuring-javascript-functions-performance the use of the
Date
object is not reliable for performance measurement and should be avoided. Insteadperformance.now()
is recommended.
window.performance is not available (I believe) from within Jest as you need a browser. If you are using Jest as your runner for Puppeteer, for example, you can do this (but way better obvs ;) ):
const setup = async deviceType => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
const client = await page.target().createCDPSession()
await emulate(client, deviceType)
await client.send(`Performance.enable`)
await page.setViewport({
width: 1920,
height: 1080,
deviceScaleFactor: 1,
})
return { browser, page }
}
const testPureFunction = async (toMeasure, args, repeatTimes, deviceType='FAST_DEVICE') => {
const { browser, page } = await setup(deviceType)
const { Timestamp: startStatus } = await page.metrics()
let totalTaken = 0
for (let i = 0;i < repeatTimes;i++) {
const { Timestamp: startTimeSubtask } = await page.metrics()
toMeasure(...args)
const { Timestamp: endTimeSubtask } = await page.metrics()
totalTaken += (endTimeSubtask -startTimeSubtask)
}
const { Timestamp: finalStatus } = await page.metrics()
await browser.close()
return {
totalMilliseconds: (finalStatus - startStatus),
averageMillisecondsPerTask: totalTaken / repeatTimes,
}
}
describe(`decodeB64 function benchmark`, () => {
it(`should run decodeB64 function 50 times and return benchmark`, async () => {
const benchmark = await testPureFunction(decodeB64, `aGVsbG8gd29ybGQh`, 50)
await expect(benchmark.totalMilliseconds).toBeLessThan(0.1)
await expect(benchmark.averageMillisecondsPerTask).toBeLessThan(0.01)
})
})
In node you should use process.hrtime()
(or process.hrtime.bigint()
on node 10.7 and newer)
Maybe it is time to re-open this issue @SimenB ?
Hello guys, I have recently created jest-bench to solve this pain. Should be able to serve most needs.
This is a proposal for a new feature that I feel would work very well with Jest and in general be of use by many in the community.
Currently, there's a lack of good performance tooling available – even more so, tooling that provides decent feedback when performance has regressed as a project grows.
This would require some new API for describing benchmarks and scenarios, but here is a very quick example of what I was envisaging:
Using the function
benchmark
would define a block of scenarios to run, with eachscenario
be a specific alternative to benchmark. This could operate in a similar fashion to benchmark.js, where you get the operations/second and how many samples were executed (as used in jsperf.com and other tools).Once the benchmark runs, Jest could store a snapshot of the results (possibly marking it against the CPU usage before the benchmark started, so it can account for background noise) and provide some feedback in terms of possible performance regressions. One possible thing to consider, is if these benchmarks should be run in parallel or not? It may skew the results (I'm not sure on this, I'd need to do some testing) depending on how system resources are being used.
Lastly, Jest could output a pretty summary at the end, for example:
This is a very early proposal, but hopefully it can be refined and built upon into an actual feature. What do people think? What needs changing/altering?