google / log4jscanner

A log4j vulnerability filesystem scanner and Go package for analyzing JAR files.
Apache License 2.0
1.57k stars 120 forks source link

Replace io.ReadAll usage with pre-allocated buffers of appropriate size #27

Closed lyoung-confluent closed 2 years ago

lyoung-confluent commented 2 years ago

There is a potential performance/memory improvement when reading the contents of a nested jar by replacing io.ReadAll:

Internally, io.ReadAll will allocate a byte slice (initially 512 bytes) and then expand the size of the buffer as needed when each (io.Reader).Read call is executed. In practice this means a slice will be allocated and resized multiple times when reading most jar files.

Because the size of the file is already known the buffer could be allocated initially at the exact size required. This is essentially what os/io.ReadFile does internally and the implementation could be copied almost exactly into a helper method.

ericchiang commented 2 years ago

Hey @lyoung-confluent,

We generally won't take performance CLs unless there's an associated benchmark that demonstrates that benefit it meaningful. Premature optimization and all that :)

We had someone looking at the performance internally, and there are definitely places to improve.

If you are looking into improve performance, please look at a tool like https://pkg.go.dev/golang.org/x/perf/cmd/benchstat so there are comparisons.

mknyszek commented 2 years ago

@lyoung-confluent See #40 which should resolve this.

ericchiang commented 2 years ago

Closed via https://github.com/google/log4jscanner/pull/40