Bootstrap OpenTelemetry .NET Auto-Instrumentation SIG and repo

pjanotti commented 4 years ago

There is already an OTEP describing the recommendations for OpenTelemetry Without Manual Instrumentation. This proposal follows it and just adds a few extra points below.

Motivation

The OTEP above has a good section on motivation to have auto-instrumentation capability. Besides those, the need to manually instrument large codebases in order to see valuable observability sometimes is an impediment to starting the instrumentation process: the engineers want to see the value before committing to the work. Auto-instrumentation allows applications to produce "good enough" observability data to show its benefits getting more projects and teams committed to properly instrument their code.

Bootstrapping

I did some initial scouting on https://github.com/datadog/dd-trace-dotnet and it satisfies the desirable features as described in the OTEP above. It uses the CLR profiling API, works on Linux and Windows, has support for various popular libraries, and supports legacy versions of .NET Core and Framework.

The interception code is written in c# and uses reflection so there is no direct dependency on the packages being intercepted for instrumentation.

Proposal

[ ] Start an Auto-Instrumentation dotnet SIG, initially piggy-backing on opentelemetry-dotnet (later we can separate meetings/etc if we feel the need)
[ ] Create the repo open-telemetry/auto-instr-dotnet per recommendations on the OTEP above (fork from the existent repo and start to implement the instrumentation using opentelemetry-dotnet

/cc @reyang @SergeyKanzhelev

MikeGoldsmith commented 4 years ago

Great, thanks for getting a start on this @pjanotti :+1:

MikeGoldsmith commented 4 years ago

Created a community issue to create the new repo.

SergeyKanzhelev commented 4 years ago

I'd like to learn more about the DD agent architecture and features. I want to make sure that we aligned with the instrumentation engine so whatever profiler component we will be using will be future proof and will be able to work side-by-side with other profilers.

There is also interest from Elastic: https://github.com/microsoft/CLRInstrumentationEngine/issues/233

@pjanotti what would you suggest as the best read on DD .NET agent features and architecture?

pjanotti commented 4 years ago

@SergeyKanzhelev I didn't find material describing the architecture itself, the dd docs are user-focused. What about if I do a presentation next SIG meeting and describe their architecture?

SergeyKanzhelev commented 4 years ago

People interested in this topic, can we have a meeting on it outside of 9AM sync up tomorrow? I have very hard time making it for a meeting in the morning during this corona times. I can send a doodle to discuss time

pjanotti commented 4 years ago

@SergeyKanzhelev it is fine for me - I will present it for people present in the meeting today, and present it again when you schedule it. I'm on PDT but can wake-up earlier or sleep later as needed :)

SergeyKanzhelev commented 4 years ago

@pjanotti would it be possible to make slides you presented available?

MikeGoldsmith commented 4 years ago

I'd be interested in the slides too - @SergeyKanzhelev do you have the recording from the SIG call?

cijothomas commented 4 years ago

https://www.youtube.com/watch?v=LOF_Aqs6vfU - The recording link Sergey shared in Gitter. Presentation starts around 15 min mark.

pjanotti commented 4 years ago

Sorry for the delay (need to update my GH notifications): link to the slides https://docs.google.com/presentation/d/18IqqBQdVwTtOCV3BTqkkf-FX1JjDhBeQehO1760Bf8Y/edit?usp=sharing

pjanotti commented 4 years ago

Follow-ups from questions at the presentation:

1. Stack traces for instrumented applications: does instrumentation break symbols or the stack itself? I did some manual tests for this and didn't encounter any broken stacks. The testing was relatively limited but so far no issues.

2. Confirm DD behavior regarding ReJIT (ie.: instrumentation after the start of target app) The DD repo has the boiler-plate code for ReJIT (it seems copied from some sample/bootstrap template) but it doesn't call RequestReJIT and the ILRewriter never creates the required ICorProfilerFunctionControl* pICorProfilerFunctionControl instance. Since DD auto-instr replace call sites depending on what is desired one has to keep track of a relatively larger set of candidates for ReJIT.

Typically ReJIT makes more sense in the scenario for performance or debugging investigations than tracing. Especially taking into account that for tracing the model is to replace the calls to intercepted methods with wrappers that process the original object or method arguments to populate span data.

References: API: https://docs.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/icorprofilerinfo4-requestrejit-method Intro: https://channel9.msdn.com/Shows/Going+Deep/CLR-45-David-Broman-Inside-Re-JIT Example: https://github.com/jimschubert/clr-profiler

3. How to handle ahead-of-time (AOT) compilation (no JIT compilation) scenarios? Per MS blog about .NET Core 5.0 most workloads should be using JIT, noted exceptions are client scenarios: iOS or client-side Blazor (web assembly). Since AOT is provided via Mono and Mono didn’t support CLR Profiler interfaces no support is expected for CLR Profiler. Moreover, the outline scenarios are for clients in which installing a CLR Profiler is not a viable option. Notice that the blog post mentions 2 modes of AOT, one of which supports JIT or code interpreter for patterns that do not work well with AOT.

Reference: https://devblogs.microsoft.com/dotnet/introducing-net-5/

SergeyKanzhelev commented 4 years ago

Document to be presented today: CLR Instrumentation Engine & Intercept Extension

reyang commented 4 years ago

https://github.com/open-telemetry/community/issues/325

open-telemetry / opentelemetry-dotnet