open-telemetry / opentelemetry-rust-contrib

OpenTelemetry Contrib Packages for Rust
Apache License 2.0
32 stars 33 forks source link

Tracing is not working for google stackdriver #8

Open cqx820 opened 2 years ago

cqx820 commented 2 years ago

Hi,

I'm trying to send trace data to Google Cloud Trace, I tried a python script with a credential file and it's working for me, but I'm unable to do the same with rust. The Cargo.toml looks like

opentelemetry = { version = "0.16.0", default_features = false, features = ["rt-tokio", "rt-async-std", "rt-tokio-current-thread", "trace", "metrics"] }
opentelemetry-stackdriver = { version = "0.12.0", default_features = false, features = ["gcp_auth", "yup-authorizer"] }
tracing = "0.1.28"
tracing-futures = "0.2.5"
tracing-opentelemetry = "0.15.0"
tracing-subscriber = "0.2.24"

My workflow is like this:

  1. Get exporter
    let handle = tokio::runtime::Handle::current();
    let spawner = tokio_adapter::TokioSpawner::new(handle);
    let exporter = StackDriverExporter::connect(
    GcpAuthorizer::new().await.unwrap(), &spawner, None, None).await.unwrap();  // Tried YupAuthorizer as well, didn't work too.
  2. Get tracer
    let tracer = sdk::trace::TracerProvider::builder()
      .with_simple_exporter(exporter)
      .build()
      .tracer("tracer_name", Some(VERSION));
  3. Get subscriber
    let subscriber = tracing_subscriber::registry().with(tracing_opentelemetry::layer().with_tracer(tracer));
  4. Set global default
    tracing::subscriber::set_global_default(subscriber);
  5. Call tracing::span!(tracing::Level::TRACE, span_name, ...)

Above is my entire workflow, after running I can't find any traces on Google Cloud Trace, I'm sure the cred file has enough permission. I've also tried raising the trace levels but no lucks. There were no runtime errors found. I wonder if I'm doing wrong with getting the exporter and the tracer.

I also wrote some sample codes below just for testing, but they didn't work for me too

let exporter = StackDriverExporter::connect(
    GcpAuthorizer::new().await.unwrap(),
    &spawner, Some(Duration::from_secs(5)), Some(5usize),
  ).await.unwrap();

let tracer = sdk::trace::TracerProvider::builder()
      .with_simple_exporter(exporter)
      .build()
      .tracer("tracer_name", Some(VERSION));

let attributes =
      vec![KeyValue::new("version", "0.0.0")];

let _span = tracer
        .span_builder("my_test_span")
        .with_attributes(attributes)
        .start(&tracer);

tracer.with_span(_span, |cx| {
      sleep(Duration::from_millis(500));
    });

Any suggestions would be appreciated, thank you very much!!

TommyCpp commented 2 years ago

Could you try adding opentelemetry::global::shutdown_tracer_provider() at the end of your main function?

cqx820 commented 2 years ago

Could you try adding opentelemetry::global::shutdown_tracer_provider() at the end of your main function?

Thanks for your reply sir, I tried your suggestion but unfortunately it didn't work, I was still not seeing anything on Google Trace. I also found another issue about the YupAuthorizer, I provided a path for persistent_token_file, but I found it's not persisting the token to disk too. I was wondering if the authorization failed, but I didn't see any panic so I was assuming the auth worked.

djc commented 2 years ago

We set it up like this:

        let authorizer = GcpAuthorizer::new().await.unwrap();
        let spawner = TokioSpawner::new(Handle::current());
        let exporter = StackDriverExporter::connect(authorizer, &spawner, None, None)
            .await
            .unwrap();
        let provider = TracerProvider::builder()
            .with_batch_exporter(exporter, Tokio)
            .with_config(Config {
                sampler: Box::new(Sampler::TraceIdRatioBased(1.0)),
                ..Default::default()
            })
            .build();

        tracing_subscriber::registry()
            .with(tracing_opentelemetry::layer().with_tracer(provider.tracer("tracing", None)))
            .try_init()
            .unwrap();

This has been proven to work quite recently.

cqx820 commented 2 years ago

We set it up like this:

        let authorizer = GcpAuthorizer::new().await.unwrap();
        let spawner = TokioSpawner::new(Handle::current());
        let exporter = StackDriverExporter::connect(authorizer, &spawner, None, None)
            .await
            .unwrap();
        let provider = TracerProvider::builder()
            .with_batch_exporter(exporter, Tokio)
            .with_config(Config {
                sampler: Box::new(Sampler::TraceIdRatioBased(1.0)),
                ..Default::default()
            })
            .build();

        tracing_subscriber::registry()
            .with(tracing_opentelemetry::layer().with_tracer(provider.tracer("tracing", None)))
            .try_init()
            .unwrap();

This has been proven to work quite recently.

Thank you very much for your reply, I really appreciate your help. I tried your setup but I was still not able to see any traces. I'm sure the environment variable was properly set and the credential file was valid too. What did I miss?


#[tokio::main]
async fn main() {
    let name = "GOOGLE_APPLICATION_CREDENTIALS";
    match env::var(name) {
      Ok(v) => println!("{}: {}", name, v),
      Err(e) => panic!("${} is not set ({})", name, e)
    }
    init_tracing().await.unwrap();
    let span = tracing::trace_span!("my_span_test", version="0.0.0", uuid="0000");
    span.in_scope(|| {
      println!("Hello, world!");
      sleep(Duration::from_millis(500));
    });
    opentelemetry::global::shutdown_tracer_provider();
}

async fn init_tracing() -> Result<(), anyhow::Error> {
  let authorizer = GcpAuthorizer::new().await.unwrap();
  let spawner = TokioSpawner::new(tokio::runtime::Handle::current());
  let exporter = StackDriverExporter::connect(
    authorizer, &spawner, None, None,
  ).await.unwrap();
  let provider = sdk::trace::TracerProvider::builder()
    .with_batch_exporter(exporter, opentelemetry::runtime::Tokio)
    .with_config(Config {
      sampler: Box::new(Sampler::TraceIdRatioBased(1.0)),
      ..Default::default()
    })
    .build();
  tracing_subscriber::registry()
    .with(tracing_opentelemetry::layer().with_tracer(provider.tracer("tracing", None))).try_init()
    .unwrap();
  Ok(())
}
djc commented 2 years ago

There are a few log::error() calls in the exporter. Maybe set up a simple logger (env_logger usually suffices) and see if any of the existing log calls trigger?

dvtkrlbs commented 2 years ago

I am having the same problem. No traces show up on the Google Cloud Trace panel. My init_telemetry function looks like this

async fn init_telemetry() {
    let app_name = "app";
    // Spans are exported in batch - recommended setup for a production application.
    global::set_text_map_propagator(TraceContextPropagator::new());
    let spawner = TokioSpawner::new(tokio::runtime::Handle::current());
    let authorizer = opentelemetry_stackdriver::GcpAuthorizer::new()
        .await
        .unwrap();
    let exporter = opentelemetry_stackdriver::StackDriverExporter::connect(
        authorizer,
        &spawner,
        None,
        Some(4),
    )
    .await
    .unwrap();
    let provider = TProvider::builder()
        .with_batch_exporter(exporter, opentelemetry::runtime::Tokio)
        .with_config(Config {
            sampler: Box::new(Sampler::TraceIdRatioBased(1.0)),
            ..Default::default()
        })
        .build();

    // Filter based on level - trace, debug, info, warn, error
    // Tunable via `RUST_LOG` env variable
    let env_filter = EnvFilter::try_from_default_env().unwrap_or(EnvFilter::new("debug"));
    // Create a `tracing` layer using the Jaeger tracer
    // let telemetry = tracing_opentelemetry::layer().with_tracer(tracer);
    // let telemetry = tracing_opentelemetry::layer().with_tracer(tracer);
    let telemetry = tracing_opentelemetry::layer().with_tracer(provider.tracer("tracing", None));
    // Create a `tracing` layer to emit spans as structured logs to stdout
    let formatting_layer = BunyanFormattingLayer::new(app_name.into(), std::io::stdout);
    // Combined them all together in a `tracing` subscriber
    tracing_subscriber::registry()
        .with(env_filter)
        .with(telemetry)
        .with(JsonStorageLayer)
        .with(formatting_layer)
        .try_init()
        .unwrap();
    // tracing::subscriber::set_global_default(subscriber)
    //     .expect("Failed to install `tracing` subscriber.")
}

the same setup works with jaeger. I am suspecting of my TokioSpawner struct which looks like this

struct TokioSpawner {
    handle: tokio::runtime::Handle,
}

impl TokioSpawner {
    fn new(handle: tokio::runtime::Handle) -> Self {
        Self { handle }
    }
}

impl Spawn for TokioSpawner {
    fn spawn_obj(&self, future: FutureObj<'static, ()>) -> std::result::Result<(), SpawnError> {
        self.handle.spawn(future);

        Ok(())
    }

    fn status(&self) -> std::result::Result<(), SpawnError> {
        Ok(())
    }
}
djc commented 2 years ago

You don't need to define the TokioSpawner -- opentelemetry-stackdriver provides it, gated by an optional feature.

Did you already try my hint about enabling logging? You'll want to set the tracing-subscriber features to default-features = false to make sure it doesn't catch logging stuff.

You'll also need to give the associated service account permissions to write traces. Anyway, I've been testing opentelementry-stackdriver myself for the past week or so, and I can definitely get it to work.

Current test setup:

    env_logger::init();
    let authorizer = GcpAuthorizer::new().await.unwrap();
    let spawner = TokioSpawner::new(Handle::current());
    let exporter = StackDriverExporter::connect(authorizer, &spawner, None, None)
        .await
        .unwrap();
    let provider = TracerProvider::builder()
        .with_batch_exporter(exporter.clone(), Tokio)
        .with_config(Config {
            sampler: Box::new(Sampler::TraceIdRatioBased(1.0)),
            ..Default::default()
        })
        .build();

    tracing_subscriber::registry()
        .with(tracing_opentelemetry::layer().with_tracer(provider.tracer("tracing", None)))
        .try_init()
        .unwrap();
dvtkrlbs commented 2 years ago

There is no errors printed. It just prints bunch of zeros without any other information. The service account I am using have admin access to the trace api.

djc commented 2 years ago

Well, I'm confident the code above works, because I've recently tested it. If it doesn't work, I suspect it's something about your environment. You could clone this project, change opentelemetry-stackdriver's dependency on opentelemetry to skip the path dependency (and fix up the occurrences of to_bytes() -> to_byte_array()), then add some dbg/println statements to see what's going on.

etiescu commented 2 years ago

Hi!

Is there any documentation/example on how to use the most recent version of this library? The above code snippets seem to be a bit outdated with the most recent code not having a TokioSpawner and the introduction of a builder that returns a tuple containing a future :)

djc commented 2 years ago

Here's some current code from the same test project:

    env_logger::init();

    let authentication_manager = AuthenticationManager::new().await.unwrap();
    let project_id = authentication_manager.project_id().await.unwrap();
    let log_context = LogContext {
        log_id: "cloud-trace-test".into(),
        resource: MonitoredResource::GenericNode {
            project_id,
            namespace: Some("test".to_owned()),
            location: None,
            node_id: None,
        },
    };

    let authorizer = GcpAuthorizer::new().await.unwrap();
    let (exporter, driver) = StackDriverExporter::builder()
        .log_context(log_context)
        .build(authorizer)
        .await
        .unwrap();

    tokio::spawn(driver);
    let provider = TracerProvider::builder()
        .with_batch_exporter(exporter.clone(), Tokio)
        .with_config(Config {
            sampler: Box::new(Sampler::TraceIdRatioBased(CLOUD_TRACE_RATE)),
            ..Default::default()
        })
        .build();

    tracing_subscriber::registry()
        .with(tracing_opentelemetry::layer().with_tracer(provider.tracer("tracing")))
        .try_init()
        .unwrap();
richardbrodie commented 2 years ago

I'm also having this problem. My code is essentially identical to @djc's latest snippet, and I get the following output:

OpenTelemetry trace error occurred. Exporter stackdriver encountered the following error(s): authorizer error: authorizer error: Could not establish connection with OAuth server
OpenTelemetry trace error occurred. Exporter stackdriver encountered the following error(s): authorizer error: Could not establish connection with OAuth server

I've confirmed using curl manually that I can successfully fetch the bearer token from https://oauth2.googleapis.com/token so I don't think it's anything wrong with my service_account json.

djc commented 2 years ago

Try using gcp_auth to get a token for the correct authz scope without opentelemetry-stackdriver, does that work?

https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-stackdriver/src/lib.rs#L191

richardbrodie commented 2 years ago

Yep, that works fine. Using scopes=["https://www.googleapis.com/auth/trace.append", "https://www.googleapis.com/auth/logging.write"] I successfully get a Token:

Token { 
    access_token: "****", 
    expires_at: Some(OffsetDateTime { 
        utc_datetime: PrimitiveDateTime { 
            date: Date { year: 2022, ordinal: 109 }, 
            time: Time { hour: 12, minute: 43, second: 58, nanosecond: 457575700 } 
        }, 
        offset: UtcOffset { hours: 0, minutes: 0, seconds: 0 } 
    }) 
}

Assuming you meant gcp_auth::AuthenticationManager::get_token(scopes) right?

djc commented 2 years ago

Yes, that's what I meant.

Did you configure either of the TLS roots Cargo features for opentelemetry-stackdriver?

richardbrodie commented 2 years ago

I have this in my Cargo.toml opentelemetry-stackdriver = { version = "0.14.0", default_features = false, features = ["tls-native-roots", "gcp_auth"] }

djc commented 2 years ago

If you run with debug-level logging enabled, which kind of authentication manager is selected?

richardbrodie commented 2 years ago

It's tracing not logging, but I get a couple of these: 2022-04-19T13:04:26.227593Z DEBUG refresh_token{self=CustomServiceAccount { credentials: ApplicationCredentials { type: Some("service_account"), project_id: Some(...), private_key_id: Some(...), private_key: ..., client_email: ..., client_id: Some(...), auth_uri: Some("https://accounts.google.com/o/oauth2/auth"), token_uri: "https://oauth2.googleapis.com/token", auth_provider_x509_cert_url: Some("https://www.googleapis.com/oauth2/v1/certs"), client_x509_cert_url: Some("https://www.googleapis.com/robot/v1/metadata/x509/...") }, signer: Signer, tokens: RwLock { data: {}, poisoned: false, .. } } client=Client scopes=["https://www.googleapis.com/auth/trace.append", "https://www.googleapis.com/auth/logging.write"]}: gcp_auth::custom_service_account: requesting token from service account: Request { method: POST, uri: https://oauth2.googleapis.com/token, version: HTTP/1.1, headers: {"content-type": "application/x-www-form-urlencoded"}, body: Body(Full(b"grant_type=urn%3Aietf%3Aparams%3Aoauth%3Agrant-type%3Ajwt-bearer&assertion=...")) }

djc commented 2 years ago

That seems okay. I'm afraid you'll have to debug for yourself where stuff is going wrong inside opentelemetry-stackdriver. There's not a lot of code there, I'd recommend cloning a local git repo and patching your test setup to use the local version and add some debug statements there.

richardbrodie commented 2 years ago

Thought so, but it's good to get a pair of fresh eyes on it before I start. Thanks for the help :)

richardbrodie commented 2 years ago

Update: after a lot of digging and debugging I must embarrassingly report that my issue was simply that in my test program the tokio runtime was being terminated before the gcp_auth thread requesting a new token was finished. I added a 1000ms sleep just before opentelemetry::global::shutdown_tracer_provider() and it works properly now.

I now get tonic error: status: PermissionDenied, message: "The caller does not have permission" but that seems out of the scope of this crate :)

djc commented 2 years ago

You'll want some of these:

cloudtrace.traces.patch logging.logEntries.create monitoring.timeSeries.create

AlexandrBurak commented 1 year ago

Screenshot from 2022-09-19 21-02-47 Screenshot from 2022-09-19 21-03-30 Hi please if possible Show a working example at the moment, I tried everything that is here, I can not compile. In the last example from April 4th, it is not possible to get trace and provider Thanks.

djc commented 1 year ago

It sounds like you might have a version mismatch going on between some of the crates you're using.

roshaans commented 1 year ago

Does an exporter need to be setup even when my service is deployed on Google's cloud run itself? I currently get logs showing up just fine in Cloud Tracing, but there is no information about spans that seems to be propagating through while the spans show up in my local jaeger setup when running my service.

djc commented 1 year ago

AFAIK GCP has a setup that forwards stuff logged to stderr to their logging service by default, but that doesn't work for tracing. So yes, if you want to get spans into Cloud Tracing I think you have to set up an exporter explicitly.

cyberbudy commented 9 months ago

I have the same problem - no traces with stackdriver exporter

I've tried to found source code, but links show 404

ivan-brko commented 7 months ago

I got everything to work with the latest versions and documented it here. I also linked the small repo where I put it all together and it correctly exports logs in my GCP Cloud Run application (as you can see in the screenshots), so you can look there.

Some things could probably be improved, but it should work.

ivan-brko commented 7 months ago

@djc thanks for all the work you did for opentelemetry-stackdriver, it was great to be able to export traces to GCP rather quickly.

Do you know if there is somewhere I can read the overview of the current state and plans for the crate? GitHub thread or something?
Currently, the docs are not up to date (for example, it's missing GcpAuthorizer), if you follow the link for the repo in the docs it leads you to https://github.com/open-telemetry/opentelemetry-rust which doesn't have the code for stackdriver (not in the main branch, anyway). I can also see that in this repo the Workspace Cargo.toml, stackdriver is commented out with the following comment: TODO: Add back in once this relies on a published version.

I'm willing to try to help if it's needed.

djc commented 7 months ago

@ivan-brko I'm using it at work so I'm also passively maintaining it. However, I'm no longer active as an opentelemetry-rust maintainer and (unrelated to that) this project got moved and apparently the opentelemetry-rust maintainers missed/postponed some things when moving this crate to its new home. PRs still welcome.

ivan-brko commented 7 months ago

@djc PRs should be made against this repo?

Currently, I can't get stackdriver in this repo (main branch) to compile, don't know if I'm doing something wrong as I didn't investigate it, wanted to check if this code is even supposed to be built.

image

I already made a small PR to fix a documentation issue created here, would work on a couple of other improvements gladly (fix the compile issue I posted if needed, documentation improvements + things like this issue) as I would like to get my feet wet with Rust OSS anyway.

If I understand correctly, opentelemetry-stackdriver in this repository wasn't used to publish the actual new versions of the crate yet?

djc commented 7 months ago

No -- unfortunately this repo was forked off the original repo only after some semver-incompatible changes to crates in the original repo were made. It might make sense to change the opentelemetry dependencies for opentelemetry-stackdriver to reference the upstream crates as a Git dependency instead for now so you can move forward here? I would probably pin them to the last commit before the extraction of the -contrib repo.

ivan-brko commented 7 months ago

Yeah, that sounds reasonable for the development of new features, but I actually might try fixing the issues with the current build. A new version of the crate cannot be deployed from this repo before fixing this problem anyway? Publishing a crate that depends on a specific commit would not work well with other libraries that depend on OTEL crates.

Do you maybe know if just fixing the compile errors would be enough or if there are other things to look into?

ivan-brko commented 7 months ago

@djc if you manage to catch some time, you can take a look at the PR, using the latest opentelemetry and opentelemetry_sdk versions and fixing compile issues.