Open cmackenthun opened 5 months ago
@cmackenthun can you please provide a minimal reproduction so we can work on this?
Sure thing @dyladan
Example repo: https://github.com/cmackenthun/otel--fetch-instrumentation-example
Basically I just copied the example directory from the repo and made some changes to show my usecase. As you can hopefully see from testing (or my gifs below). If I wait for the fetch call to be done and then flush and navigate away, I don't get my fetch span. But if I add an arbitrary wait between the fetch and flush, then I do get my fetch span exported. To me this is because of the setTimeout
that is used to end the span inside the fetch instrumentation flow.
fetch()
➡️ flush()
➡️ redirect()
fetch()
➡️ wait 1s ➡️ flush()
➡️ redirect()
What happened?
Steps to Reproduce
Expected Result
All network calls for that endpoint are traced
Actual Result
A percentage of these network calls are not traced
Additional Details
We have seen that one of our metrics is not being reported 100% of the time and have narrowed it down to the usage of
setTimeout
inopentelemetry-instrumentation-fetch
as the reason why. Sometimes the routing happens quick enough that the timeout function hasn't run and the span is never ended and therefore can never be flushed or collected on our side.There is a comment in the fetch instrumentation code about the reasoning for the timeout being waiting for "observer to collect information about resources", but I think overall we would prefer to get metrics that are lacking these resources than none at all. If the
setTimeout
has to stick around, it would be nice if the fetch instrumentation could know to end these spans when the page is being navigated away from so we can get a higher count of spansSome docs about
setTimeout
not be an overly consistent method to use for exact timing: https://developer.mozilla.org/en-US/docs/Web/API/setTimeout#reasons_for_delays_longer_than_specified