100's of Tracing Spans are left un-ended from every query timeout
I am a prism goalie
Who wants to have a stable heroic
So that I can focus on features and not get woken up at night and have angry users
These un-ended spans represent a real runtime risk to heroic. If ~700-1000 of these are left hanging around after each timeout-d query, it's conceivable that the JVM will :
potentially run out of memory altogether
experience much longer GC pauses / sweep times (cos of all the hanging spans needing reaping)
hugely inflate the size of heroic's logs, costing us $$$ and obscuring "genuine" problems
Proposed Solution
find the correct location to catch the BT timeout exception (not trivial)
catch it, end the span and throw it out again
Repro Steps
run heroic locally with GUC config and on branch feature/add-bigtable-timeout-settings-refactored
capture a lengthy query from grafana using the chrome dev tools network tab
alter the query to hit localhost and watch the logs, you'll see this message
List of methods concerned from logs
ERROR io.opencensus.trace.Tracer - Span localMetricsManager.fetchSeries is GC'ed without being ended.
ERROR io.opencensus.trace.Tracer - Span bigtable.fetchBatch is GC'ed without being ended.
100's of Tracing Spans are left un-ended from every query timeout
These un-ended spans represent a real runtime risk to heroic. If ~700-1000 of these are left hanging around after each timeout-d query, it's conceivable that the JVM will :
Proposed Solution
catch
the BT timeout exception (not trivial)catch
it, end the span andthrow
it out againRepro Steps
List of methods concerned from logs