open-telemetry / opentelemetry-js-contrib

OpenTelemetry instrumentation for JavaScript modules
https://opentelemetry.io
Apache License 2.0
703 stars 518 forks source link

User events getting registered twice #548

Open ayush000 opened 3 years ago

ayush000 commented 3 years ago

What version of OpenTelemetry are you using?

@opentelemetry/api: 1.0.0 @opentelemetry/context-zone: 0.22.0

What version of Node are you using?

12.14.0

What did you do?

I used open telemetry with create-react-app

// setupTelemetry.js
import { registerInstrumentations } from "@opentelemetry/instrumentation";
import { DocumentLoadInstrumentation } from "@opentelemetry/instrumentation-document-load";
import { WebTracerProvider } from "@opentelemetry/web";
import {
  ConsoleSpanExporter,
  SimpleSpanProcessor,
} from "@opentelemetry/tracing";
import { UserInteractionInstrumentation } from "@opentelemetry/instrumentation-user-interaction";
import { ZoneContextManager } from "@opentelemetry/context-zone";

// Minimum required setup - supports only synchronous operations
const provider = new WebTracerProvider();
provider.addSpanProcessor(
  new SimpleSpanProcessor(new ConsoleSpanExporter())
);
provider.register({
  contextManager: new ZoneContextManager(),
});

registerInstrumentations({
  instrumentations: [
    new DocumentLoadInstrumentation(),
    new UserInteractionInstrumentation(),
  ],
  tracerProvider: provider,
});

// index.js
import "./setupTelemetry"

What did you expect to see?

Events would get logged only once in console by instrumentation-user-interaction

What did you see instead?

Each click event is registered twice by instrumentation-user-interaction

Additional context

https://user-images.githubusercontent.com/1942688/123246534-85b85380-d503-11eb-9f83-d281a0f74314.mov

CptSchnitz commented 3 years ago

I tried looking into this, and it seems like the first span is for the click event on the root div, and only the second one is on the intended target. I tried to change it by editing this line: https://github.com/open-telemetry/opentelemetry-js-contrib/blob/52b46aa29b40feb70db623a9b1354a259ebb19e2/plugins/web/opentelemetry-instrumentation-user-interaction/src/instrumentation.ts#L452 to:

if (target && target === task.target) { 

but this ignores all click event except the one on the target (ignores all propagation). I just don't know what the intended behavior is.

obecny commented 3 years ago

can you try without zone ?

obecny commented 3 years ago

Are you able to reproduce it with version 0.21.x for api, core and contrib ?

CptSchnitz commented 3 years ago

can you try without zone ?

Without zone there are two traces, both on the button. image

Are you able to reproduce it with version 0.21.x for api, core and contrib ?

On 0.21.x I get the same behavior as on 0.23.x, but the target is the actual element the event is registered on: image those are the event handlers for the button: image

All my tests were run on this app:

import "./App.css";

function App() {
  return (
        <button onClick={(e) => console.log("bar")}>test</button>
  );
}

export default App;
CptSchnitz commented 3 years ago

I think it happens because of how react handles events. First the DOM event bubbles up to the root DOM container (before react 17 to the window), and then the react events happen. So one trace is the actual click, and the second one is the handler react adds on the root. This is a good explanation: https://reactjs.org/blog/2020/08/10/react-v17-rc.html#changes-to-event-delegation

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

bradfrosty commented 3 years ago

I am also seeing this behavior, with and without zone.js. Would it be possible to add a filter feature, or some way to collate event bubbling?

bradfrosty commented 3 years ago

I looked into this more today and discovered some more information. Within patchZoneRunTask, it seems that the User Interaction Instrumentation is creating a new zone each time _patchZoneRunTask() is invoked. In my case, this ends up creating several spans that aren't very useful with durations on the order of microseconds.

I believe it would be more useful if it tracked a recently ran zone (e.g. within the same event loop iteration), and for the same event target, allowing fewer spans to be created.

Note: this is specific to users of zone.js + ZoneContextManager. I did not do this digging for the StackContextManager.

dyladan commented 3 years ago

@obecny maybe you can take a peek at this?

dyladan commented 3 years ago

it seems that the User Interaction Instrumentation is creating a new zone each time _patchZoneRunTask() is invoked

What do you mean the instrumentation is creating a new zone? Do you mean it is calling context.with or that it is actually interacting directly with zones? Or that it simply does something which causes a zone to be automatically created?

I believe it would be more useful if it tracked a recently ran zone (e.g. within the same event loop iteration), and for the same event target, allowing fewer spans to be created.

Can you give a concrete example of what that would look like?

bradfrosty commented 3 years ago

What do you mean the instrumentation is creating a new zone? Do you mean it is calling context.with or that it is actually interacting directly with zones? Or that it simply does something which causes a zone to be automatically created?

I mean that it is calling context.with() for each individual listener, regardless of if they share the same event target. There is no approximation to share the zone (and subsequently the span/span context) for a single user interaction. Basically, the user interaction is not treated as one interaction. It is effectively treated as many user interactions: one unique interaction for each listener on a given event target. By "creating a new zone" I basically mean creating a new span context as they are similar in the context of otel.

Can you give a concrete example of what that would look like?

Yup — I'll setup a simple repro, but I'll describe in more detail here first.

Let's assume an application sets up four global click handlers on the document (they could be in source code or vendor/framework code). When a user clicks anywhere on the page, these are fired. Right now, this is creating 4 unique spans. If there's a client navigation involved, then it's 5 unique spans. One for each listener, and one for the navigation hook. Some of these listeners don't really do much, nor take very long, so they look meaningless in comparison to the listener that did most of the work (e.g. one that fetched data), but these small click spans will still show up when querying in a UI since they share the same name/xpath. Nothing really differentiates them in the span context besides their IDs.

Instead, I think it would be more valuable to create one span/span context for whole interaction. It would look like a single top level click span, or a single top level navigation span if there's a route change. The span would start when the first listener runs. Subsequent listener tasks would then check if this span has been created (for example, maybe by the target name, or target name + event name if more events besides click are supported in the future). If the span exists, re-use it and added to the current running task count on that span.

After each listener completes, a heuristic could be used to model the current macro cycle: run a timeout that finishes after 50ms. I believe this is what OpenCensus does. After 50ms, the timeout will run which effectively ends the current span. But if more spans came in during the same macro cycle, they will re-use the current span. Each listener that runs will reset that timeout.

bradfrosty commented 3 years ago

I was actually able to get the above solution working with local modifications to the user interaction instrumentation, and am now seeing much more high quality span data. However, I have not tested it using the default context manager yet. Happy to work with you or @obecny to contribute it.

I'm not sure if this would be considered a bug fix, in the case that consumers find the current behavior desirable. Potentially this could be an option to the plugin (e.g. group or groupByInteraction)?

bradfrosty commented 3 years ago

@dyladan do you have a recommended platform for sharing a repro? I'm trying to use stackblitz, but I'm getting the error "Cannot find module perf_hooks". Looks like Stackblitz isn't recognizing the browser key in the package.json.

I also tried using the versions published on jsdelivr, but doesn't look like they are converted to UMD.

bradfrosty commented 3 years ago

@dyladan I was able to get a reproducer using Stackblitz.

You will see that I am unable to group these click spans. In the first part, I create four document listeners, which as you will see in the console creates four individual spans, even though it's the same interaction. This happens with React for example, even though it's out of our control.

In the second part, I attempted to wrap these click handlers spans with a parent. I was unsuccessful in figuring a way to make this work... interestingly, the zone relationships are correct. But the click handlers are always root spans.

dyladan commented 3 years ago

Ah I see. So the issue is that a single click event fires multiple listeners and each of these listeners starts a span with no link between them or common parent?

It seems to me that creating 4 spans is correct behavior, since they are 4 different units of execution. But I do think they should have a common ancestor. It seems like we may need a parent span for the interaction itself, and child spans for each event listener callback execution

obecny commented 3 years ago

The whole idea of user interaction is that the user interaction is what is the root span. So the click event is the main initiator of the whole flow. The outcome of this is that click event cannot have a parent.

bradfrosty commented 3 years ago

Ah I see. So the issue is that a single click event fires multiple listeners and each of these listeners starts a span with no link between them or common parent?

Yes exactly. I agree that the individual spans for each listener makes sense, but without a relationship there ends up being a lot of noise when querying for data. Each span will have the same name, xpath, tag name, and similar timestamps, making them nearly impossible to differentiate. The only other differentiating factor is the span's duration, which can vary depending on factors such as network conditions or device type.

Here's another example showing what can happen when a page includes google tag manager (not uncommon for our product, we're a CMS company). I've setup a testing GTM project that is tracking user clicks. However, my application only has one click listener setup. This ends up creating three spans for a single click: one for my application, and two from GTM.

The same thing is happening with the OP for React, since React sets up its own global listener. As a result, there are "duplicate spans".

obecny commented 3 years ago

One click should produce one flow only, if one click produces more than this is what should be fixed, instead of trying to add extra parent for all those newly created spans.

dyladan commented 3 years ago

I think brad is right. The parent should be the click itself. Each handler is a span which is the current behavior and I think its correct. You're actually both saying the same thing. You said "the click should be the main initiator" and that's what brad is saying, but there is no span for the click itself.

obecny commented 3 years ago

in your example the number of generated spans for click event should be 1 only., 2 means there is a bug

bradfrosty commented 3 years ago

Yea I think we're on the same page 😄
One click creates one span. Instead of the current behavior where one click creates X number of spans, where X is number of listeners registered that run for the given event target.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] commented 2 years ago

This issue was closed because it has been stale for 14 days with no activity.

bradfrosty commented 2 years ago

This is still a bug. I missed the comments on my PR. Drafting up a response now.

akbarnafisa commented 2 years ago

any updates on this issue ?

Josh-a-e commented 2 years ago

Yeah I'd also be very interested in being able to fix this.

TusharShahi commented 1 year ago

I am also facing this

jalongod commented 1 year ago

I am also facing this

jalongod commented 1 year ago

I am also facing this

ido-g-coralogix commented 1 year ago

I am also facing this

devgalintown-mtv commented 1 year ago

Is there any update on this @dyladan @obecny? I am very new to Opentelemetry. Currently started working on instrumenting a vue.js app. I am using @opentelemetry/auto-instrumentations-web for that and facing similar issue, but for me it has one trace id but 5 -6 child spans. is it normal because of event bubbling?

devgalintown-mtv commented 1 year ago

I am also facing this any solution to this ?

elafito commented 11 months ago

Is there any update on this @dyladan @obecny? I am very new to Opentelemetry. Currently started working on instrumenting a vue.js app. I am using @opentelemetry/auto-instrumentations-web for that and facing similar issue, but for me it has one trace id but 5 -6 child spans. is it normal because of event bubbling?

I am having the exact same problem on React.

lucasdengPru commented 11 months ago

I am having the exact same problem on React. as attachment shows.

Screenshot-trace-button-click

image

prajon84 commented 3 months ago

ANy solution to this? I am having this problem in Aug 2024 too.