googleapis / java-spanner

Apache License 2.0
54 stars 111 forks source link

Spanner - Propagating timeouts to Spanner SDK layer to better manage Deadlines per Spanner query execution #2411

Closed psinghbay1 closed 1 year ago

psinghbay1 commented 1 year ago

Environment details

  1. Google Cloud Spanner
  2. OS type and version: Mac, Unix, Linux flavors
  3. Java version: Java 8/11/17
  4. version(s): test google-cloud-spaner's with 6.36.1 and lower

Steps to reproduce

  1. Regular SpannerTemplate has issues. Used Flaky version try to feed the custom timeout. It doesn't work
  2. Used the Working version. Which works but is not a clean solution that can be used as abstraction in large code bases

Code example

The Spring Data for Spanner comes up with a handy util called ‘Context’ which could be used to supply configurations for the given call context.

Flaky Version

try {
    ...
    // Create a context with a ExecuteQuery timeout of 10 seconds.
    Context context =
      Context.current()
          .withValue(
              SpannerOptions.CALL_CONTEXT_CONFIGURATOR_KEY,
              SpannerCallContextTimeoutConfigurator.create()
                  .withExecuteQueryTimeout(Duration.ofSeconds(10)));

    ResultSet rs = context.call(() -> {
        return spannerTemplate.executeQuery(statement);
    });

    // Process record
    if (rs.next()) {
        Struct struct = rs.getCurrentRowAsStruct();
        ...
        return data;
    }
} catch (SpannerException e) {
  if (e.getErrorCode() == ErrorCode.DEADLINE_EXCEEDED) {
    throw new RuntimeException("DEADLINE_EXCEEDED");
  }
}
...

The above maintains the new Timeout context only at spannerTemplate.executeQuery() and not after this. Given the Spanner client internally uses Streaming (ExecuteStreamingSql), it doesn’t call Spanner backend until rs.next() is invoked. So wrapping rs.next() in the context.call() ensures that the right context is maintained and passed along.

Working Version

try {
    ...
    // Create a context with a ExecuteQuery timeout of 10 seconds.
    Context context =
      Context.current()
          .withValue(
              SpannerOptions.CALL_CONTEXT_CONFIGURATOR_KEY,
              SpannerCallContextTimeoutConfigurator.create()
                  .withExecuteQueryTimeout(Duration.ofSeconds(10)));

    String data = context.call(() -> {
        ResultSet rs = spannerTemplate.executeQuery(statement);
        // Process record
        if (rs.next()) {
            Struct struct = rs.getCurrentRowAsStruct();
            ...
            return data;
        }
    });
} catch (SpannerException e) {
  if (e.getErrorCode() == ErrorCode.DEADLINE_EXCEEDED) {
    throw new RuntimeException("DEADLINE_EXCEEDED");
  }
}
...

Above code passes the timeout to context’s (GrpcCallContext) streaming fields image

Instead of Call option’s deadline:

image

This happens due to SpannerCallContextTimeoutConfigurator doesn’t set callOption.

If we ensure that SpannerCallContextTimeoutConfigurator populates callOption, then GrpcCallContext.merge() will copy It to final context and pass along the deadline.

The context.call() changes the Streaming Wait & Idle timeouts for the given call.

Stack trace

Its a regular timeout case and the application has normal timeout error message.

External references such as API reference guides

Any additional information below

Spring Data for Spanner should ensure that SpannerCallContextTimeoutConfigurator sets callOption.

I'm exploring short term and long term solution to propagating the timeouts to the lower layer.

Thanks! Praveendra Singh

psinghbay1 commented 1 year ago

@rajatbhatta Hi, I'm curious to know when this can be part of the development work. I'm happy to test out the fixes if needed. thanks

rajatbhatta commented 1 year ago

Hi @psinghbay1: This looks like more of an enhancement in the Spring data library. Can you please log an issue there? Thanks!

Closing this. Feel free to re-open if there's anything we can help with.