uniiverse / apollo-tracing-ruby

[Maintainers Wanted] Ruby implementation of GraphQL trace data in the Apollo Tracing format
MIT License
85 stars 10 forks source link

Missing instrumentation for graphql-ruby lazy resolve #5

Open joshjcarrier opened 6 years ago

joshjcarrier commented 6 years ago

Wondering how this might be supported, especially when paired with shopify/graphql-batch, as the cost of many fields are aggregated together with DataLoader

exAspArk commented 6 years ago

@joshjcarrier hey, thanks for opening the issue! The support for introspecting lazy_resolve wasn't implemented yet in the gem.

how this might be supported

That's a good question. I'm not sure, but probably something like:

# we have to load 3 user records lazily
# the execution timeline:
| user1.lazy – 1ms
  | user2.lazy – 1ms
    | user3.lazy – 1ms
      | user1.resolve – 80ms # batching is here
        | user2.resolve – 1ms
          | user3.resolve – 1ms

In total:

* user1 – 81ms – started with 0ms offset
* user2 – 2ms  – started with 1ms offset
* user3 – 2ms  – started with 2ms offset

In Apollo Engine (might be a little bit tricky to implement):

user1 --------------------------------------------------------------------- 81ms
                                                                    user2 -- 2ms
                                                                     user3 -- 2ms

But ideally should be (where "x" is an idle state):

user1 -xx------------------------------------------------------------------ 81ms
 user2 -xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx- 2ms
  user3 -xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx- 2ms

Will ask people from Apollo whether they're planning to support such representation in the future.

Please let me know what you think.

joshjcarrier commented 6 years ago

I like the idea of all lazies spanning a batched request. I think the granularity could be improved to be useful for unequal costs of lazy resolution, but that's deeper than just the execution time of an unfulfilled promise.

Some ways to interpret that sample trace:

Case 1: each user costs the same time

Cases: Optimized DB query

Expected timeline

User1, 2 or 3 is adding to the cost of the query

user1 --------------------------------------------------------------------- 81ms (81ms batch)
user2 --------------------------------------------------------------------- 81ms (81ms batch)
user3 --------------------------------------------------------------------- 81ms (81ms batch)

Actual timeline

Pretty close if you ignore the 2ms underestimations; knowing how the lazy resolver was implemented.

user1 -xx------------------------------------------------------------------ 81ms
user2 -xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx- 2ms
user3 -xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx- 2ms

Case 2: each user costs different time

Cases: HTTP parallelization, chunking batches

Expected timeline

User2 is adding significant cost to the query

user1 --------------------xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx- 27ms (81ms batch)
user2--------------------------------------------------------------------- 81ms (81ms batch)
user3 -xxxxxx-----------------------xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx- 27ms (81ms batch)

Actual timeline

The significant cost could have come from querying any of the users.

user1 -xx------------------------------------------------------------------ 81ms
user2 -xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx- 2ms
user3 -xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx- 2ms

The challenge will be how to bubble this information up, if Apollo Engine is meant to tell us the true cost of each piece of data.

exAspArk commented 6 years ago

@joshjcarrier thanks a lot for your comment!

I contacted Apollo team, they currently don't support the "idle" state (so, people won't see the actual picture in the interface) but they may implement it in the future since there are a lot of questions and it's not so clear what is happening during the timeframe.

I think that for now, we'll implement the simplest trace:

user1 --------------------------------------------------------------------- 81ms
 user2 --------------------------------------------------------------------- 81ms
  user3 --------------------------------------------------------------------- 81ms

While in reality it should look like:

user1 -xx------------------------------------------------------------------ 81ms
 user2 -xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx- 2ms
  user3 -xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx- 2ms
joshjcarrier commented 6 years ago

Thanks a lot for following up!

Wondering if it it's valuable to override measurements with something from my resolver, since its all accessible in ctx['apollo-tracing'] ?

resolve ->(obj, args, ctx) {
  MyLoader.for(ctx).load(..).then do |result|
    # need the context for the current resolver
    # opens up future for tagging traces and other metadata for Apollo Engine UI?
    ctx['apollo-tracing']['resolvers'][??]['duration']  = result.my_custom_duration
  end
}

Then it could be customized to look like:

user1 -------------------- 27ms 
user2--------------------------------------------------------------------- 81ms
user3 -------------------- 27ms
exAspArk commented 6 years ago

Yeah, potentially you could override the values but I'd consider accessing the data in the context as private :) I'd pause the work on this issue for now until Apollo Engine supports showing the idle time per field in case of batching.