modernweb-dev / web

Guides, tools and libraries for modern web development.
https://modern-web.dev
MIT License
2.23k stars 291 forks source link

[web-test-runner] Conversion of v8 test data to istanbul is slow #2023

Open benjamind opened 2 years ago

benjamind commented 2 years ago

We've discovered that there is significant overhead in the conversion of code coverage reports by WTR from the v8 format output from chrome dev tools protocol, to the istanbul format used in the reporters.

In projects with larger code graphs under test this conversion time is upwards of 30s on fast machines, much worse in CI.

From digging around the only reason I can conclude for this conversion is the compatibility with other coverage reporting tools that use istanbul. Primarily this seems to be the generation of a HTML coverage report and the JUnit output.

Would the team consider moving away from istanbul to speed up the test run? This might entail writing a custom HTML report using the v8 data, but could be a significant speed up?

daKmoR commented 2 years ago

interesting find... are you running your tests in chrome only? I'm unsure how that would work on other browsers? or via selenium?

I think a flag could be an option...

@LarsDenBakker what's your though on this?

benjamind commented 2 years ago

Yes this is in chrome only. Well chrome for coverage. We separately run Firefox tests also. We're running with the playwright test runner.

I agree it wouldn't work with selenium or other browsers of course. Not sure the best path towards this, perhaps coupling a specific config of the playwright runner to generate v8 only with a plugin for a new reporter that takes that format?

43081j commented 1 year ago

i think i figured most of it out now, got our test run down from ~15 minutes to ~6 minutes with the following:

Parallel computation (minor)

this loop runs the async tasks in serial. we could improve this to be like a Promise.all (but can't do a Promise.all because it makes WDS sad, sending so many requests at once. some kind of parallelism limit could work)

perf improvement: minor

Only fetch matching sourcemaps

here we unconditionally fetch the sourcemap of a given file.

later, we decide if we even want to compute coverage for this file or not.

we should move that line into the condition (it isn't needed any earlier anyway).

perf improvement: definitely noticeable

Load the file into coverage once

here we load the given file on disk into the v8-to-istanbul function.

this results in it actually reading the file, loading it, etc.

within one page, we don't see much duplication here since its unlikely v8 would give us dupes in the coverage array.

however, across multiple calls to v8ToIstanbul, we regularly pass the same files in (pages have imports in common). if we can cache the converter like this:

                  let converter = ccache.get(filePath);
                  if (!converter) {
                    converter = v8toistanbul(filePath, false, sources);
                    await converter.load();
                  }
                  ccache.set(filePath, converter);

we only load the file off disk once, and reuse it many times.

perf improvement: massive, several minutes cut off our test runs


the last is the biggest, but off the top of my head im not sure where such a cache could live. as it'd need to be reused across many calls to this v8toistanbul function.

maybe we give it an optional cache param or something, a map, which it can reuse as state. not sure

cc @Westbrook

Westbrook commented 1 year ago

Does this have similarities to the way lru-cache is used in the transform step of Dev Server? https://github.com/modernweb-dev/web/blob/6c5893cc79c91e82f9aec35d3011c6e33ce878cc/packages/dev-server-core/src/middleware/PluginTransformCache.ts May be some learnings to be shared there...

43081j commented 1 year ago

@Westbrook weirdly using an lrucache with a max: 200 took it down to 4 minutes rather than 6!

witchcraft

seems like a good option then. i suppose we should be careful with how we key it though

im currently keying by the ~browser url, which seems fine. but maybe across multiple v8ToIstanbul calls, two browser urls could mean different things?~

actually its by file path so we're all good!

but if we want to cache the sourcemap fetches too, they would be by browser url.

Westbrook commented 1 year ago

Nice! If you're looking into that, we may also benefit from taking a moment to update lru-cache to its latest versions, here and at other usage locations. It purports to be even faster there.