openzipkin / zipkin-aws

Reporters and collectors for use in Amazon's cloud
Apache License 2.0
69 stars 34 forks source link

java.lang.IllegalStateException: Missing: traceId with AWS propagation type within AWSExtractor #201

Closed BenitoVisone closed 9 months ago

BenitoVisone commented 1 year ago

Describe the Bug

Environment setup:

In our application we use Spring Cloud Sleuth for tracing in our micro-services. We set propagation type to AWS because our architecture is based on this Cloud Provider.

The problem we are facing is the following: When we receive an X-Amzn-Trace-Id header with the following format: Root=1-1373cbb-37f4b48ed7ff3eebbd62b5e01 the application crashes with the Exception java.lang.IllegalStateException: Missing: traceId from the brave.propagation package during the trace id extraction.

We opened an issue on spring-cloud-sleuth which directed us here.

Based on this AWS Documentation, the X-Amzn-Trace-Id header consists of three numbers separated by hyphens version-time-id , where the time field should be of 8 hexadecimal digits.

The problem seems related to the time(1373cbb, 7 characters) format, when its length is different from 8. In fact, in this section of code we can see that if the timestamp length is different from 8 the traceId is never initialised, causing the Exception.

This is the only case that throws this unhandled Exception, in other cases, such as invalid hexadecimal digits, the application correctly generates a new trace id without propagating the malformed one.

Is this working as intended?

Steps to Reproduce

In order to test this behaviour you need to test a request with the header X-Amzn-Trace-Id: Root=1-1373cbb-37f4b48ed7ff3eebbd62b5e01 which cause the Exception being thrown.

Expected Behaviour

The expected behaviour should be the same as when an invalid trace id with non-hexadecimal characters arrives, that is, it should be ignored and generated from scratch.

devinsba commented 1 year ago

Do you know the source of these headers that have only 7 hex digits for the time? Given that it's epoch time, the example above is from August 1970 which definitely isn't right.

The extractor should be more resilient but the source should also be fixed

BenitoVisone commented 1 year ago

We are investigating on the source of these headers trying to understand why the epoch time isn't right.

Absolutely, the source needs to be fixed as well.

codefromthecrypt commented 9 months ago

thanks for the fixes in #203