webpack / webpack-sources

Source code handling classes for webpack
MIT License
261 stars 71 forks source link

Let `CachedSource` cache `node` and `listMap` methods #109

Closed JoostK closed 3 years ago

JoostK commented 3 years ago

Since development on webpack-sources 2.0 began in 993d98fdece701ca8373356fd56afe373f8ae84a the CachedSource implementation would no longer delegate the original source's node and listMap methods. Instead, a SourceNode is now instead be generated using the sourceAndMap method from which a SourceNode is then created using a SourceMapConsumer. This adds a significant performance penalty, as creating a SourceNode without going through an explicit source map object and then parsing it again can be much cheaper.

This commit reintroduces CachedSource.node and CachedSource.listMap to re-enable the fast path of creating a SourceNode.

alan-agius4 commented 3 years ago

@sokra can you please take a look at this?

sokra commented 3 years ago

Running some benchmarks...

For my test case it gives a 7% performance boost of the compilation (production build with source maps). But it also increases memory usage by 50%... 😮

Doesn't seem to affect performance of hot build with persistent cache.

JoostK commented 3 years ago

For my test case it gives a 7% performance boost of the compilation (production build with source maps). But it also increases memory usage by 50%... 😮

Yeah, there's definitely a trade-off here. The memory consumption is likely related to storing map as encoded from node during serialization, which now means that both representations are cached. The same is true when loading from memory, where we continue to store the map object and source node. Maybe it would be beneficial if only one of the two forms are stored, and that perhaps the serialized data stores whether it should deserialize to a map object or a source node.

sokra commented 3 years ago

Ok another data point...

Restoring SourceNodes from Maps seem to be slightly more expensive than just calling node on the original source again, when the original sources are OriginalSources.

Seems like SourceMap to SourceNode is pretty expensive. I guess we should better serialize the SourceNodes directly in some kind of lighter format...

I'll do another test with SourceMapSources as original source.

JoostK commented 3 years ago

Seems like SourceMap to SourceNode is pretty expensive. I guess we should better serialize the SourceNodes directly in some kind of lighter format...

Yeah, this is where the 3-4x perf regression originates :-)

sokra commented 3 years ago

I'll do another test with SourceMapSources as original source.

Ok that looks better. This is probably a more realistic scenario. e. g. when using babel.

Uncached node: 4.9s (calling to original node()) Disk cached node: 2.5s (converting from Map to Node) Memory cached node: 19ms (Nodes already cached in memory)

but after node() it takes a constant 1.3s to convert the SourceNode to a SourceMap again^^

btw. this is for a 16MB source file with 15MB source map file

sokra commented 3 years ago

Uncached node: 4.9s (calling to original node()) Disk cached node: 2.5s (converting from Map to Node) Memory cached node: 19ms (Nodes already cached in memory)

Compared to master:

Uncached node: 15.5s (calling to original node()) Disk cached node: 10.9s Memory cached node: 2.4s

sokra commented 3 years ago

Sounds good to merge, but I think there is more potential to optimize here by using a custom SourceMap serialization for caching instead of converting to SourceMap.

sokra commented 3 years ago

common-libs development-build+source-map+initial-persistent-cache+babel-env

rel diff name mean med min max std con baseCon curCon
153.58% unclear exec -1.4% -0.3% -1.6% -2.2% -30% -30% 5.6% 4%
100% -12% stats -17% -16% -17% -18% -58% -58% 7.4% 3.7%
44.14% +47% cache.PackFileCacheStrategy.store pack +63% +64% +56% +70% 27.7x 27.7x 1.1% 19%
31.3% -30% Compiler.seal compilation -41% -38% -43% -40% +18% +18% 11% 22%
27.7% -32% Compilation.process assets -44% -41% -47% -43% +22% +22% 12% 26%
41.63% unclear heap memory -13% -35% -8.3% +3.8% +24% +24% 101% 144%

common-libs development-rebuild+source-map+persistent-cache+babel-env

(Might be a bit unrealistic since you shouldn't use source-map for incremental builds)

rel diff name mean med min max std con baseCon curCon
201.32% -21% exec -32% -33% -33% -31% +9.5% +9.5% 12% 20%
100% -53% stats -55% -55% -54% -56% -89% -89% 8% 1.9%
77.41% -59% Compiler.seal compilation -61% -62% -60% -62% -89% -89% 9.4% 2.6%
63.8% -63% Compilation.process assets -66% -66% -65% -66% -89% -89% 11% 3.5%
40.68% unclear cache.PackFileCacheStrategy.store pack +59% +61% +62% +50% +29% +29% 134% 109%
38.24% -9.1% heap memory -18% -19% -14% -20% -92% -92% 18% 1.8%

Takeaway

Source Map processing is faster and better cached. In the initial build with persistent cache, most of the benefit during build is lost in cache serialization again, but not all. Since this is optional and after the build, that's good.

Memory usage doesn't seem to increase. We probably save a lot memory due to avoiding the nested caching from before.