nginx / unit

NGINX Unit - universal web app server - a lightweight and versatile open source server that simplifies the application stack by natively executing application code across eight different programming language runtimes.
https://unit.nginx.org
Apache License 2.0
5.4k stars 331 forks source link

OpenTelemetry Implementation #1431

Closed avahahn closed 1 month ago

avahahn commented 2 months ago

This PR contains an OpenTelemetry module that is compiled during the configure phase and statically linked during compilation of Unit. The OpenTelemetry module is a Rust crate that encapsulates the logic needed to generate and send spans and traces, as well as managing batching and background threads.

During operation, Unit will initialize the OpenTelemetry tracer when it is configured by the user in the router thread. If configured, spans will be generated and stored in batches during HTTP request decoding and processing. deconfiguration and reconfiguration will reset and reinitialize the OpenTelemetry tracer.

The following changes are made to the code:

ac000 commented 1 month ago

There are a bunch of files which do not have any change in them besides a rearrange / alignment. Those files should not form part of this PR.

Well, they can (and probably should) be part of the PR, as I assume they form preparatory work, but should be done as separate commits first.

javorszky commented 1 month ago

There are a bunch of files which do not have any change in them besides a rearrange / alignment. Those files should not form part of this PR.

Well, they can (and probably should) be part of the PR, as I assume they form preparatory work, but should be done as separate commits first.

They are just alignment changes, things like

- something
-   .call()
+ something.call()

or

- something(StructName{property});
+ something( StructName { property } );

I don't see how these are prep works 🙁 , though I might be missing something.

ac000 commented 1 month ago

I don't see how these are prep works 🙁 , though I might be missing something.

Well, if they're not being subsequently touched then leave them alone full stop...

ac000 commented 1 month ago

Great job! I wonder if nxt_otel_send_trace() is asynchronous or synchronous?

I would like to see some document describing the overall architecture of this... especially as a lot of it is effectively hidden away in a black box...

javorszky commented 1 month ago

Great job! I wonder if nxt_otel_send_trace() is asynchronous or synchronous?

that function call is synchronous. It grabs the span at the pointer, and sends it to the channel and returns.

The span will then be handled asynchronously on the other end of the channel, but NGINX no longer needs to know about it.

hongzhidao commented 1 month ago

Great job! I wonder if nxt_otel_send_trace() is asynchronous or synchronous?

that function call is synchronous. It grabs the span at the pointer, and sends it to the channel and returns.

The span will then be handled asynchronously on the other end of the channel, but NGINX no longer needs to know about it.

The sender which calls nxt_otel_send_trace() synchronously runs in the threads of the router process, where does the receiver which receives span and handles it asynchronously run?

ac000 commented 1 month ago

Great job! I wonder if nxt_otel_send_trace() is asynchronous or synchronous?

that function call is synchronous. It grabs the span at the pointer, and sends it to the channel and returns.

The span will then be handled asynchronously on the other end of the channel, but NGINX no longer needs to know about it.

To be clear, the other side of this channel is tokio?

I think what @hongzhidao is getting at is whether this will cause Unit to block.

IIRC the plan was to have a thread pool rust side that would deal with the actual sending of the telemetry data and thus allowing Unit to get back to handling client requests ASAP.

javorszky commented 1 month ago

Great job! I wonder if nxt_otel_send_trace() is asynchronous or synchronous?

that function call is synchronous. It grabs the span at the pointer, and sends it to the channel and returns. The span will then be handled asynchronously on the other end of the channel, but NGINX no longer needs to know about it.

The sender which calls nxt_otel_send_trace() synchronously runs in the threads of the router process, where does the receiver which receives span and handles it asynchronously run?

when the nxt_otel_init() is called, we spawn a new thread from the rust code with the receiver passed to it, and in that thread the function called runtime will loop over messages coming from the channel. As long as the channel is open, and no shutdown message is received, that function loops in a separate thread.

hongzhidao commented 1 month ago

I think what @hongzhidao is getting at is whether this will cause Unit to block.

Correct.

javorszky commented 1 month ago

I think what @hongzhidao is getting at is whether this will cause Unit to block.

Correct.

No, it will not block.

hongzhidao commented 1 month ago

I think what @hongzhidao is getting at is whether this will cause Unit to block.

Correct.

No, it will not block.

That would be great. Btw, you could test it with some performance tool likewrk.

ac000 commented 1 month ago

@avahahn

I think it would be a big help if you were to squash these two commits together, then see how it should be split up.

E.g.

So we end up with a logical set of commits that progress the incorporation of this feature. Each commit should make sense and be reviewable on its own.

That would significantly improve the review process.

avahahn commented 1 month ago

Well, they can (and probably should) be part of the PR, as I assume they form preparatory work, but should be done as separate commits first.

I agree, if we just get it linted and merged we wont have to deal with it again.

avahahn commented 1 month ago

I would like to see some document describing the overall architecture of this... especially as a lot of it is effectively hidden away in a black box...

In this case it is simple: the Span object has a custom destructor (implementation of the Drop trait) defined in the library we are using that calls the linked tracer object to send off the span before it is deallocated.

hongzhidao commented 1 month ago

@avahahn

I think it would be a big help if you were to squash these two commits together, then see how it should be split up.

E.g.

  • Add core OTEL stuff (C & Rust)
  • Add build infrastructure
  • Add tracepoints
  • Wire it up to the config system

So we end up with a logical set of commits that progress the incorporation of this feature. Each commit should make sense and be reviewable on its own.

That would significantly improve the review process.

Does Add tracepoints mean trace http request? If yes, it's good to separate core OTEL stuff and http OTEL stuff.

ac000 commented 1 month ago
  • Add tracepoints
  • Wire it up to the config system

That would significantly improve the review process.

Does Add tracepoints mean trace http request? If yes, it's good to separate core OTEL stuff and http OTEL stuff.

So I was meaning these things

+#if (NXT_HAVE_OTEL)
+    nxt_otel_test_and_call_state(task, r);
+#endif

But I think I know what you're referring to and yes.

ac000 commented 1 month ago

B.t.w we could probably make them a little nicer by wrapping them in macro/inline function, so e.g.

+#if (NXT_HAVE_OTEL)
+    nxt_otel_test_and_call_state(task, r);
+#endif

would become something like (and variations of)

+NXT_OTEL_TRACEPOINT(task, r);

Especially if we're going to be sprinkling these things all over the place...

Hell maybe at some point we'd want to do something like static keys.

hongzhidao commented 1 month ago
  • Add tracepoints
  • Wire it up to the config system

That would significantly improve the review process.

Does Add tracepoints mean trace http request? If yes, it's good to separate core OTEL stuff and http OTEL stuff.

So I was meaning these things

+#if (NXT_HAVE_OTEL)
+    nxt_otel_test_and_call_state(task, r);
+#endif

But I think I know what you're referring to and yes.

@ac000, Is it clear to you when you review nxt_otel.c? I mean putting core and http OTEL stuff together. nxt_otel.c looks like pure OTEL stuff but doesn't contain the caller like http request.

ac000 commented 1 month ago

@hongzhidao

@ac000, Is it clear to you when you review nxt_otel.c? I mean putting core and http OTEL stuff together. nxt_otel.c looks like pure OTEL stuff but doesn't contain the caller like http request.

Are you thinking things should be moved into or out of nxt_otel.c?

Just so we're on the same page, could you give a specific example?

Though for me the current way the PR is presented, it's hard to answer overall architectural questions like that...

hongzhidao commented 1 month ago

The naming looks like a otel library wrapped in Unit to be used in other places if I didn’t look into internal source code, but it contains http stuff. It also contains conf validation, it’s clear that we usually put all validation stuff into nxt_conf_validation.c, I’ve commented this point. I’d like to see something description about the intention of the file.

ac000 commented 1 month ago

The naming looks like a otel library wrapped in Unit to be used in other places if I didn’t look into internal source code,

Right, that's basically what it is.

If I've got it right, it's basically a rust wrapper around https://opentelemetry.io/docs/languages/rust/ with a C interface on-top.

but it contains http stuff. It also contains conf validation, it’s clear that we usually put all validation stuff into nxt_conf_validation.c, I’ve commented this point

Yes, you are quite right about that.

I’d like to see something description about the intention of the file.

I'd expect nxt_otel.c to be the C interface to the above mentioned rust wrapper...

ac000 commented 1 month ago

Essentially I see nxt_otel.c as being analogous to say what nxt_openssl.c is to OpenSSL...

ac000 commented 1 month ago

Wait, what!?

diff --git ./auto/sources ./auto/sources
andrew@kappa unit:otel-clean $ git diff --stat master..
 auto/help                                     |    2 +
 auto/options                                  |    2 +
 auto/otel                                     |   27 +
 auto/sources                                  |    4 +
 auto/summary                                  |    1 +
 configure                                     |    4 +
 docs/unit-openapi.yaml                        |   85 +++
 src/nxt_conf_validation.c                     |   31 ++
 src/nxt_h1proto.c                             |   18 +
 src/nxt_http.h                                |    5 +
 src/nxt_http_request.c                        |   23 +-
 src/nxt_main.c                                |    6 +-
 src/nxt_main.h                                |    1 -
 src/nxt_otel.c                                |  356 ++++++++++++
 src/nxt_otel.h                                |   57 ++
 src/nxt_router.c                              |   41 +-
 src/otel/Cargo.lock                           | 1724 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/otel/Cargo.toml                           |   22 +
 src/otel/src/lib.rs                           |  291 ++++++++++
 tools/unitctl/unit-openapi/README.md          |    7 +
 tools/unitctl/unitctl/src/cmd/applications.rs |   20 +-
 tools/unitctl/unitctl/src/cmd/edit.rs         |   13 +-
 tools/unitctl/unitctl/src/cmd/execute.rs      |   15 +-
 tools/unitctl/unitctl/src/cmd/listeners.rs    |   11 +-
 tools/unitctl/unitctl/src/cmd/mod.rs          |    2 +-
 tools/unitctl/unitctl/src/cmd/save.rs         |   44 +-
 tools/unitctl/unitctl/src/cmd/status.rs       |   11 +-
 tools/unitctl/unitctl/src/main.rs             |    8 +-
 tools/unitctl/unitctl/src/wait.rs             |    7 +-
 29 files changed, 2756 insertions(+), 82 deletions(-)

Why are we messing around under tools/unitctl?

Most of this seems to come from

commit 675ce164a631f88541d554d91b9f877ccc4f8860
Author: Ava Hahn <a.hahn@f5.com>
Date:   Mon Sep 9 12:19:04 2024 -0700

    Phase 2 OpenTelemetry Extensions

    * migrate opentelemetry tracer lifetime into router process
    * extend unit configuration API with validators for otel config
    * update openapi spec
    * regenerate unitctl openapi package
    * add method and path fields to opentelemetry span
    * spans export in configurable batches on background thread

    Co-authored-by: Gabor Javorszky <g.javorszky@f5.com>
    Signed-off-by: Ava Hahn <a.hahn@f5.com>

 auto/otel                                     |    4 -
 docs/unit-openapi.yaml                        |   85 +++++
 src/nxt_conf_validation.c                     |   34 +-
 src/nxt_http_request.c                        |   11 +-
 src/nxt_main.c                                |   14 +-
 src/nxt_main.h                                |    3 -
 src/nxt_otel.c                                |   73 +++-
 src/nxt_otel.h                                |   28 +-
 src/nxt_router.c                              |   41 ++-
 src/otel/Cargo.lock                           | 1051 +++++++++++++++++++++++++++++++++++++++++++--------------
 src/otel/Cargo.toml                           |   15 +-
 src/otel/src/lib.rs                           |  227 ++++++++++---
 tools/unitctl/unit-openapi/README.md          |    7 +
 tools/unitctl/unitctl/src/cmd/applications.rs |   20 +-
 tools/unitctl/unitctl/src/cmd/edit.rs         |   13 +-
 tools/unitctl/unitctl/src/cmd/execute.rs      |   15 +-
 tools/unitctl/unitctl/src/cmd/listeners.rs    |   11 +-
 tools/unitctl/unitctl/src/cmd/mod.rs          |    2 +-
 tools/unitctl/unitctl/src/cmd/save.rs         |   44 ++-
 tools/unitctl/unitctl/src/cmd/status.rs       |   11 +-
 tools/unitctl/unitctl/src/main.rs             |    8 +-
 tools/unitctl/unitctl/src/wait.rs             |    7 +-
 22 files changed, 1312 insertions(+), 412 deletions(-)

All the tools/unitctl stuff needs to go into their own commit(s)...

javorszky commented 1 month ago

All the tools/unitctl stuff needs to go into their own commit(s)...

Yep, that's what I was going for with this comment: https://github.com/nginx/unit/pull/1431#pullrequestreview-2308788605

Every single change in the unitctl package is just code alignment, and I get that they may be prep work (https://github.com/nginx/unit/pull/1431#issuecomment-2355611262), they shouldn't form part here 🙂

ac000 commented 1 month ago

All the tools/unitctl stuff needs to go into their own commit(s)...

Yep, that's what I was going for with this comment: #1431 (review)

Every single change in the unitctl package is just code alignment, and I get that they may be prep work (#1431 (comment)), they shouldn't form part here 🙂

This would imply it's more than just that...

* update openapi spec
* regenerate unitctl openapi package

Which seems to include some name changes...

ac000 commented 1 month ago

Let me quote from our CONTRIBUTING.md

- Create atomic commits.  A commit should do just one thing, i.e. you
  shouldn't mix refactoring with functional code changes.  Do the
  refactoring in one or more commits first.

  Ideally you should rebase locally and force push new commits up.

This could be be expanded out to something like

Each commit should do one thing, and one thing only. If you find        
yourself, in the commit message, adding phrases like "Also do [...]" or 
"While in here [...]", then that's a sign that the change should have   
been split into multiple commits. If your change includes some          
refactoring of code to make your change possible, then that refactoring 
should be a separate commit, done first. That means this preparatory     
commit won't have any functional changes, and hence should be a no-op.   
It also means that your main commit, with the change that you actually  
care about, will be smaller and easier to review.                       

Each commit must stand on its own in terms of what it provides, and how 
it works. Lots of changes are just a single commit, but for something a 
bit more involved, it's not uncommon to have a pull request contain     
multiple commits. Make each commit as simple as possible, and not any   
simpler. We'd much rather see 10 simple commits than 2 more complicated 
ones. If you stumble across something that needs fixing while making an 
unrelated change, then please make that change as a separate commit,    
explaining why it's being made.                                         

Each commit in a series must be buildable, it's not enough that the end 
result is buildable. See reason 3 in the introduction for why that's the
case.                                                                   

No fixup commits! Sometimes people post a change and errors are pointed 
out in the commit, and the author then does a followup fix for that     
error. This isn't acceptable, please squash fixup commits into the      
commit that introduced the problem in the first place. This is done by  
amending the fix into the original commit that caused the issue. You can
do that with git rebase -i and arrange the commit order such that the   
fixup is right after the original commit, and then use 's' (for squash) 
to squash the fixup into the original commit. Don't forget to edit the  
commit message while doing that, as git will combine the two commit     
messages into one. Or you can do it manually. Once done, force push your
rewritten git history. See reasons 1-3 in the introduction series for   
why that is. 

Now I didn't just make that up... That's from the liburing CONTRIBUTING.md.

Just saying...

ac000 commented 1 month ago

@avahahn

Can I kindly ask you to present these patches in a coherent manner.

As it currently stands it would be easier to review it if was all one big patch, which is how I'm currently reviewing it (git diff master..)

If we don't follow our own advice how can we expect others to?

ac000 commented 1 month ago

Oopsie, looks like an errant change...

diff --git ./auto/ssltls ./auto/ssltls
index 6512d330..d8b79558 100644
--- ./auto/ssltls
+++ ./auto/ssltls
@@ -212,4 +212,4 @@ if [ $NXT_POLARSSL = YES ]; then
         $echo
         exit 1;
     fi
-fi
+fi
\ No newline at end of file
ac000 commented 1 month ago

Another newline issue...

diff --git ./auto/otel ./auto/otel
new file mode 100644
index 00000000..11387a1b
--- /dev/null
+++ ./auto/otel
@@ -0,0 +1,25 @@
+
+# Copyright (C) NGINX, Inc.
+
+
+nxt_found=no
+NXT_HAVE_OTEL=NO
+
+NXT_OTEL_LIB_LOC=src/otel/target/debug/libotel.a
+
+cat << END >> $NXT_AUTO_CONFIG_H
+#ifndef NXT_HAVE_OTEL
+#define NXT_HAVE_OTEL 1
+#endif
+END
+
+NXT_LIB_AUX_LIBS="$NXT_LIB_AUX_LIBS $(pkgconf openssl --cflags --libs || "-lssl -lcrypto") $NXT_OTEL_LIB_LOC"
+
+cat << END >> $NXT_MAKEFILE
+
+$NXT_OTEL_LIB_LOC:
+    pushd $NXT_BUILD_DIR/src/otel/
+    cargo build
+    cd ../../
+
+END
\ No newline at end of file

What editor are you using? I've also noticed some erratic indenting in your patches...

The .editorconfig does cover these files

[{auto/**,*.toml}]
charset = utf-8
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true

But maybe your editor doesn't use it...

avahahn commented 1 month ago

Can I kindly ask you to present these patches in a coherent manner.

As it currently stands it would be easier to review it if was all one big patch, which is how I'm currently reviewing it (git diff master..)

If we don't follow our own advice how can we expect others to?

I am working on that after I take care of low hanging fruit like nits and style preferences.

avahahn commented 1 month ago

What editor are you using? I've also noticed some erratic indenting in your patches...

Emacs primarily

avahahn commented 1 month ago

Please help me understand what otel does in http? The otel state is allocated when a new request is made.

First thing is that when the traceparent header is received it is parsed to get the span ID and parent ID. Spans may come from many other services and all end up in one unified hierarchy in the collector service.

Next the span is initialized. We make one span per request. It is either linked to the parent declared in the Traceparent header or the span id and parent may be fully autogenerated.

Once all header fields are done parsing they are added to the span. At that point it is guaranteed that any traceparent header has been parsed and that the span is fully initialized.

After the body has been processed the size is added to the span.

Finally as soon as we are sure there is no more data to parse the span is finished and sent to the background thread that sends all spans in batches to the tracer.

What's the difference between the two ways?

What would the benefit of that be over the current state machine? It seems like a slower design that would be more error prone. The current state machine for otel uses pre-existing routines in Unit like the header field parsing and generates the span as soon as it can be in small amounts.

Either way might work but I don't understand why you would prefer it all to be done at once at the end of request processing. I would worry that the request data doesnt live long enough to be copied during the request closing, and I would worry that concentrating all the work at the end would add latency to request closing.

ac000 commented 1 month ago

@avahahn

What's the currently expected way to build this?

$ ./configure ... --otel
$ make
...
  AR     build/lib/libnxt.a
  CC     build/src/nxt_main.o
  LD     build/sbin/unitd
/usr/bin/ld: cannot find src/otel/target/debug/libotel.a: No such file or directory
collect2: error: ld returned 1 exit status
make: *** [build/Makefile:2044: build/sbin/unitd] Error 1
$

There is

#ifndef NXT_HAVE_OTEL
#define NXT_HAVE_OTEL 1

in build/include/nxt_auto_config.h

There is seemingly no

src/otel/target/debug/libotel.a:
    pushd $NXT_BUILD_DIR/src/otel/
    cargo build
    cd ../../

target in build/Makefile

hongzhidao commented 1 month ago

Please help me understand what otel does in http? The otel state is allocated when a new request is made.

First thing is that when the traceparent header is received it is parsed to get the span ID and parent ID. Spans may come from many other services and all end up in one unified hierarchy in the collector service.

Next the span is initialized. We make one span per request. It is either linked to the parent declared in the Traceparent header or the span id and parent may be fully autogenerated.

Once all header fields are done parsing they are added to the span. At that point it is guaranteed that any traceparent header has been parsed and that the span is fully initialized.

After the body has been processed the size is added to the span.

Finally as soon as we are sure there is no more data to parse the span is finished and sent to the background thread that sends all spans in batches to the tracer.

What's the difference between the two ways?

What would the benefit of that be over the current state machine? It seems like a slower design that would be more error prone. The current state machine for otel uses pre-existing routines in Unit like the header field parsing and generates the span as soon as it can be in small amounts.

Either way might work but I don't understand why you would prefer it all to be done at once at the end of request processing. I would worry that the request data doesnt live long enough to be copied during the request closing, and I would worry that concentrating all the work at the end would add latency to request closing.

Is it right that we add serval spans but only passed them to Rust engine once? If yes, what's the good to add spans in different places but not add them when they are prepared to send?

avahahn commented 1 month ago

Is it right that we add serval spans but only passed them to Rust engine once?

we make one span, and add several events to it.

hongzhidao commented 1 month ago

Is it right that we add serval spans but only passed them to Rust engine once?

we make one span, and add several events to it.

The two functions are defined in rust part?

nxt_otel_add_event_to_trace
nxt_otel_send_trace

Will both of them be communicating with rust?

ac000 commented 1 month ago

Trying to manually build the libotel part

   Compiling otel v0.1.0 (/home/andrew/src/unit/src/otel)
error[E0308]: mismatched types
   --> src/lib.rs:199:35
    |
199 |     assert_eq!(traceparent.len(), TRACEPARENT_HEADER_LEN);
    |                                   ^^^^^^^^^^^^^^^^^^^^^^ expected `usize`, found `u8`
    |
help: you can convert a `u8` to a `usize`
    |
199 |     assert_eq!(traceparent.len(), TRACEPARENT_HEADER_LEN.into());
    |                                                         +++++++

error[E0308]: mismatched types
   --> src/lib.rs:204:9
    |
201 |     ptr::copy_nonoverlapping(
    |     ------------------------ arguments to this function are incorrect
...
204 |         TRACEPARENT_HEADER_LEN,
    |         ^^^^^^^^^^^^^^^^^^^^^^ expected `usize`, found `u8`
    |
note: function defined here
   --> /builddir/build/BUILD/rustc-1.81.0-src/library/core/src/intrinsics.rs:3036:21
help: you can convert a `u8` to a `usize`
    |
204 |         TRACEPARENT_HEADER_LEN.into(),
    |                               +++++++

error[E0308]: mismatched types
   --> src/lib.rs:207:14
    |
207 |     *buf.add(TRACEPARENT_HEADER_LEN) = b'\0' as _;
    |          --- ^^^^^^^^^^^^^^^^^^^^^^ expected `usize`, found `u8`
    |          |
    |          arguments to this method are incorrect
    |
note: method defined here
   --> /builddir/build/BUILD/rustc-1.81.0-src/library/core/src/ptr/mut_ptr.rs:1039:25
help: you can convert a `u8` to a `usize`
    |
207 |     *buf.add(TRACEPARENT_HEADER_LEN.into()) = b'\0' as _;
    |                                    +++++++

For more information about this error, try `rustc --explain E0308`.
error: could not compile `otel` (lib) due to 3 previous errors
$ cargo --version
cargo 1.81.0 (2dbb1af80 2024-08-20)
$ rustc --version
rustc 1.81.0 (eeb90cda1 2024-09-04) (Fedora 1.81.0-1.fc40
javorszky commented 1 month ago

Is it right that we add serval spans but only passed them to Rust engine once?

we make one span, and add several events to it.

The two functions are defined in rust part?

nxt_otel_add_event_to_trace
nxt_otel_send_trace

Will both of them be communicating with rust?

nxt_otel_add_event_to_trace adds events to a span we started with nxt_otel_get_or_create_trace

nxt_otel_send_trace is used when we've finished decorating the span with information as we had access to them, and at that point the span from Unit is deemed "complete", and can be sent off to the collector.

Here "event" refers to additional properties of the request: its http verb, the headers used, the path, the body size (if there's any), etc... Event should probably be called Attribute (there's a set_attribute method on the span), but the function of the add event fn is to collect the properties of the single request we have a span for.

hongzhidao commented 1 month ago

Is it right that we add serval spans but only passed them to Rust engine once?

we make one span, and add several events to it.

The two functions are defined in rust part?

nxt_otel_add_event_to_trace
nxt_otel_send_trace

Will both of them be communicating with rust?

nxt_otel_add_event_to_trace adds events to a span we started with nxt_otel_get_or_create_trace

nxt_otel_send_trace is used when we've finished decorating the span with information as we had access to them, and at that point the span from Unit is deemed "complete", and can be sent off to the collector.

Here "event" refers to additional properties of the request: its http verb, the headers used, the path, the body size (if there's any), etc... Event should probably be called Attribute (there's a set_attribute method on the span), but the function of the add event fn is to collect the properties of the single request we have a span for.

Thanks for the explanation.

can be sent off to the collector.

What does collector mean here? Does it mean the thread running rust or the otel server outside of Unit?

javorszky commented 1 month ago

What does collector mean here? Does it mean the thread running rust or the otel server outside of Unit?

There are 4 parts to most OpenTelemetry functionality:

  1. the instrumented code: this is the bit that lives inside your functions and adds the attributes / events to spans, creates the spans, measures things, etc
  2. the exporter: it's part of the SDK that you add to the code. The finished spans get collected here, batched, and once they reach a certain number, the exporter will open a connection to someplace to send the data. This is the part that exfils the tracing / metrics / logs / whatever you configured out of your application
  3. the collector: this is what receives the data from the exporter in most cases. It usually lives as a sidecar or daemon to your application, but can be in a completely different pod / container / server. In most cases it means the thing called "OpenTelemetry Collector", see https://opentelemetry.io/docs/collector/. Its job is to receive all the data, do some transformation on it optionally, and then forward it to someplace that we can use to make sense of the raw data
  4. the dashboard / ui / aggregator. This will receive the transformed / filtered data from the collector, and will draw charts and dashboards for us to look at. Grafana is an example.

This code implementation is mainly part 1, and merely configuring part 2.

Parts 3 and 4 are vendor supplied pieces that we hook up. This is what I used for local testing: https://github.com/grafana/docker-otel-lgtm

ac000 commented 1 month ago

After "fixing" the compile errors

$ cargo build
   Compiling otel v0.1.0 (/home/andrew/src/unit/src/otel)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.69s
$ make
  AR     build/lib/libnxt.a
  LD     build/sbin/unitd
$

Hmm...

$ ls -oh /opt/unit/sbin/unitd 
-rwxr-xr-x 1 andrew 2.4M Sep 18 14:08 /opt/unit/sbin/unitd
$ ls -oh build/sbin/unitd 
-rwxr-xr-x 1 andrew 92M Sep 20 16:54 build/sbin/unitd

Crikey!

$ strip build/sbin/unitd
$ ls -oh build/sbin/unitd 
-rwxr-xr-x 1 andrew 21M Sep 20 16:56 build/sbin/unitd

Rust eh?1

... and /opt/unit/sbin/unitd wasn't even stripped...

$ strip /opt/unit/sbin/unitd
$ ls -oh /opt/unit/sbin/unitd 
-rwxr-xr-x 1 andrew 580K Sep 20 17:01 /opt/unit/sbin/unitd

Now I know libotel is statically linked. But even the libc .a is only 6.6MB

(Just observing...)

ac000 commented 1 month ago

How do you configure this in Unit?

I'd expect to see an example in the commit message...

ac000 commented 1 month ago

Parts 3 and 4 are vendor supplied pieces that we hook up. This is what I used for local testing: https://github.com/grafana/docker-otel-lgtm

Ugh. My plan is to get it running then observe it under wireshark...

avahahn commented 1 month ago

Closing for now, as some large work has to be done in order to address some of these comments. I am excited to put it back up once everything has been taken care of!