open-telemetry / opentelemetry-rust

The Rust OpenTelemetry implementation
https://opentelemetry.io
Apache License 2.0
1.72k stars 386 forks source link

[Bug]: Overlapping contexts #1887

Open mladedav opened 2 weeks ago

mladedav commented 2 weeks ago

What happened?

If ContextGuards are dropped out of order, the state may be corrupted. Each ContextGuard restores the last context but that may cause a context different than the root one to be active after all ContextGuards are dropped.

The current code assumes that context guards will be dropped in reverse order of their creation. I didn't see anything in the spec saying that this should or shouldn't be assumed. I think this whole report mostly boils to what should be the behavior and whether this should be allowed or the API should be built so that these out-of-order operations were impossible to trigger.

The following is slightly changed opentelemetry::context::tests::nested_contexts.

    #[test]
    fn overlapping_contexts() {
        #[derive(Debug, PartialEq)]
        struct ValueA(&'static str);
        #[derive(Debug, PartialEq)]
        struct ValueB(u64);
        let outer_guard = Context::new().with_value(ValueA("a")).attach();

        // Only value `a` is set
        let current = Context::current();
        assert_eq!(current.get(), Some(&ValueA("a")));
        assert_eq!(current.get::<ValueB>(), None);

        let inner_guard = Context::current_with_value(ValueB(42)).attach();
        // Both values are set in inner context
        let current = Context::current();
        assert_eq!(current.get(), Some(&ValueA("a")));
        assert_eq!(current.get(), Some(&ValueB(42)));

        assert!(Context::map_current(|cx| {
            assert_eq!(cx.get(), Some(&ValueA("a")));
            assert_eq!(cx.get(), Some(&ValueB(42)));
            true
        }));

        drop(outer_guard);

        let current = Context::current();
        assert_eq!(current.get::<ValueA>(), None);
        assert_eq!(current.get::<ValueB>(), None);
        // `inner_guard` is still alive so both `ValueA` and `ValueB` should still be accessible? Not sure about this one.

        drop(inner_guard);

        let current = Context::current();
        assert_eq!(current.get(), Some(&ValueA("a")));
        assert_eq!(current.get::<ValueB>(), None);
        // Both guards are dropped and neither value should be accessible.
    }

API Version

0.23.0, git master

SDK Version

N/A

What Exporter(s) are you seeing the problem on?

N/A

Relevant log output

No response

lalitb commented 2 weeks ago

We still don't know the right behavior in this scenario. Would be looking in this. Also, @mladedav if you would like to help here.

mladedav commented 2 weeks ago

We still don't know the right behavior in this scenario

When you say still, does that mean this has been discussed before?

I'll try to ask in the spec repo then what should be the bahavior.

lalitb commented 2 weeks ago

@mladedav Sorry for the confusion. I wanted to say that even though the specs is clear about not allowing the incorrect ordering of context detach, if you have suggestion on how to fix the code.

mladedav commented 2 weeks ago

As I read it kind of allows it but we should "identify wrong call order" but it doesn't really say what context should be active after the operation. Since the implementation uses a drop guard, there is no way to return an error.

But other implementations should have the same issue with try/finally or with blocks, do we know how do they handle it?

Since the API is optional, I wonder if not providing the API at all would be an option and maybe providing methods such as Context::set_active<T>(self, fun: impl FnOnce() -> T) -> T { /* runs fun with self as active context */ } instead which prevent this misuse. But I guess that that's steering too for away from the spec?

Otherwise, I guess that the contexts could work as a linked list and if we drop guard for one in the middle we just reconnect its child with its parent but leave the active span as is. Does that sound good?