open-telemetry / opentelemetry-js

OpenTelemetry JavaScript Client
https://opentelemetry.io
Apache License 2.0
2.74k stars 796 forks source link

Memory leak in the fetch instrumentation when used against an infinite fetch request resulting in memory leak #4888

Open shuhaowu opened 3 months ago

shuhaowu commented 3 months ago

What happened?

Steps to Reproduce

  1. Setup auto instrumentation in an application.
  2. Open an infinite fetch request to a server to stream a large amount of data. Read the data via response.getReader().read() and discard the data (i.e. do not store the data in JS memory).
  3. Eventually Chrome will crash as it uses ~10GB of memory holding the response in memory due to the patching done by OpenTelemetry (see additional details below for analysis).

Expected Result

No memory leak occurs.

Actual Result

Memory leaks, browser/OS eventually kills the tab due to memory exhaustion.

Additional Details

Looking at the implementation of patchConstructor, we see these lines where the response is cloned into 2 additional variables:

https://github.com/open-telemetry/opentelemetry-js/blob/2e42181ff07003b1dcae2eaa392e681d43d27ea3/experimental/packages/opentelemetry-instrumentation-fetch/src/fetch.ts#L351-L352

The body of resClone is read. The body data is not used by the code, but this allows the data to be freed from memory:

https://github.com/open-telemetry/opentelemetry-js/blob/2e42181ff07003b1dcae2eaa392e681d43d27ea3/experimental/packages/opentelemetry-instrumentation-fetch/src/fetch.ts#L357-L364

However, the body of resClone4Hook is never read from. This causes the browser to keep the response data in memory despite it being consumed by both the original stream by the user and the resClone stream. The resClone4Hook is only used to be passed to endSpanOnSuccess:

https://github.com/open-telemetry/opentelemetry-js/blob/2e42181ff07003b1dcae2eaa392e681d43d27ea3/experimental/packages/opentelemetry-instrumentation-fetch/src/fetch.ts#L360

The code within opentelemetry doesn't seem to use response.body in any way. It does seem like it passes the response to potentially user-defined functions.

In any case enabling autoinstrumentation causes memory-leak induced browser tab crashes when used with infinite fetch requests. This is very difficult to debug as the memory leak is not even in the JS heap (since resClone4Hook's body never got read to JS, the memory used doesn't show up in JS heap dumps), and instead is happening inside Chrome's private memory. The regular tab JS heap OOM killer doesn't even work with it. Something else in Chrome kills the tab after ~13GB of RAM usage.

OpenTelemetry Setup Code

No response

package.json

No response

Relevant log output

No response

shuhaowu commented 3 months ago

One possible suggestion is the close the resClone4Hook's body. That probably will resolve this leak.