open-telemetry / opentelemetry-proto

OpenTelemetry protocol (OTLP) specification and Protobuf definitions
https://opentelemetry.io/docs/specs/otlp/
Apache License 2.0
576 stars 253 forks source link

Indicate if a span's parent or link is remote using 2 bit flag as described in OTEP 0182 #484

Closed estolfo closed 8 months ago

estolfo commented 1 year ago

This adds parent_span_is_remote field to Span message introduced in OTEP 0182: https://github.com/open-telemetry/oteps/blob/main/text/0182-otlp-remote-parent.md

Update: The PR now uses 2 bits from flags to represent the 3 states of parent or link is remote: unknown, not remote, is remote.

estolfo commented 1 year ago

Hi @tigrannajaryan @Oberon00 @bogdandrutu can we reopen this discussion now that v1.0 is out?

carlosalberto commented 1 year ago

(added it to the Spec call agenda next week)

carolabadeer commented 1 year ago

Hi @tigrannajaryan @Oberon00 @carlosalberto, any updates on this discussion?

Oberon00 commented 1 year ago

Hi @carolabadeer, what do you mean by "this discussion"? If you mean the one at https://github.com/open-telemetry/opentelemetry-proto/pull/484#discussion_r1275117793 then it would be better to reply there, as the top-level PR comments have no threading functionality and it gets confusing soon.

carlosalberto commented 1 year ago

Ping @tigrannajaryan

tigrannajaryan commented 1 year ago

Sorry for slow response, I was out for a couple weeks.

Re-reading the thread I don't see any new arguments, so I think we should go ahead with a tristate enum.

tigrannajaryan commented 1 year ago

Another approach is to represent this as a 2-bitfield of a new int32 field, part of which will be also used by https://github.com/open-telemetry/opentelemetry-proto/pull/503

Oberon00 commented 1 year ago

@jsuereth

Can you add documentation for Unset denoting that older clients may not set this field, in which case how would one figure out whether ParentSpanIsRemote?

You can't find out. That's kind of the point of the field.

Oberon00 commented 1 year ago

@tigrannajaryan I think we should not block this PR with this, but consider this refactoring before the next release, which should probably wait until after both this PR and the flags PR(s) have been merged.

tigrannajaryan commented 1 year ago

@tigrannajaryan I think we should not block this PR with this, but consider this refactoring before the next release, which should probably wait until after both this PR and the flags PR(s) have been merged.

I am not sure it is a good idea. This is a repository that is declared stable. We should minimize any churn here. An accidental release in an intermediate incorrect state can become a huge problem. It may be worth looking into using branches to prepare work for next releases instead of doing it on main branch.

Oberon00 commented 1 year ago

An accidental release in an intermediate incorrect state can become a huge problem. It may be worth looking into using branches to prepare work for next releases instead of doing it on main branch.

The state would not be incorrect, unless you view that optimization/refactoring as vital. Stable applies only to releases, not any intermediate main branch state, right?

tigrannajaryan commented 1 year ago

Stable applies only to releases, not any intermediate main branch state, right?

This has not been discussed.

Formally, it is likely that you are right, we only need stability guarantees to apply to releases.

However we don't currently have a good process that ensures main branch is in a state that satisfies stability guarantees before we make the release. We can work on adding such a process, but absent that I want to be careful and for now require that "main" branch state must also meet the stability guarantees. We can lift off that requirement when there are checks in place that prevent mistakes like releasing intermediary states that are not supposed to be released.

This is a high-stakes repo, we need to be extra careful here.

Again, a possible approach for this repo would be to do a branch-based development where all work for the next release (1.1 in this case) is done on a separate branch, not on "main", to make sure the release process is very explicit and there is no easy way to accidentally release from an intermediary "main" state that is not actually ready to be released. This would require merging PRs to the branch and let the changes bake for a while and then once we are happy with accumulated changes we would cut the next release by merging the branch to "main" and then making a release from "main".

There are likely other processes we can use, I am open for suggestions.

carlosalberto commented 1 year ago

Now that #503 has been merged, we can discuss this PR again. IIRC @tigrannajaryan wanted this information to exist in the newly added flags field in the mentioned PR, rather than exist as an independent parent_is_remote field. Could you confirm/clarify this please?

tigrannajaryan commented 1 year ago

Now that #503 has been merged, we can discuss this PR again. IIRC @tigrannajaryan wanted this information to exist in the newly added flags field in the mentioned PR, rather than exist as an independent parent_is_remote field. Could you confirm/clarify this please?

Yes, that was my suggestion. We can take 2 more bits from flags to represent the 3 states we need for parent_is_remote.

estolfo commented 1 year ago

@tigrannajaryan I've updated this PR to define a 2-bit flag that can be used to indicate if the span's parent is remote. Let me know if something need to be adjusted about the definition. Thanks!

estolfo commented 1 year ago

I've just made a change to update the mask name and add an enum for interpreting bits 8 and 9 of a Link's flags instead of reusing the constants for the span's flags. Let me know if this is the preferred approach.

Oberon00 commented 1 year ago

@tigrannajaryan Your blocking review https://github.com/open-telemetry/opentelemetry-proto/pull/484#pullrequestreview-1591380897 is still active. If you want to dismiss it, you need to do it through the box at the bottom of the PR conversation that says "1 Change requested". Just submitting another review as "Comment" won't dismiss it (although submitting an Approve will)

tigrannajaryan commented 1 year ago

@tigrannajaryan Your blocking review #484 (review) is still active. If you want to dismiss it, you need to do it through the box at the bottom of the PR conversation that says "1 Change requested". Just submitting another review as "Comment" won't dismiss it (although submitting an Approve will)

I keep the block deliberately to make sure we don't accidentally merge it since technically there is enough approvals.

estolfo commented 1 year ago

Latest changes: I've left two enums: one for the remote link valid flag values and one for the remote parent values. I've also reordered the documentation to group the flag explanations and address the unused bits after.

jmacd commented 1 year ago

Structurally, I am happy with this change. I would suggest naming changes as hinted above, making the new flags into context flags whether they are part of the span or the link, including:

Span.flags -> Span.context_flags Link.flags -> Link.context_flags

and

enum ContextFlagsReferenceIsRemote {
  CONTEXT_FLAGS_REFERENCE_IS_REMOTE_UNKNOWN = 0;
  CONTEXT_FLAGS_REFERENCE_IS_REMOTE_TRUE = ...;
  CONTEXT_FLAGS_REFERENCE_IS_REMOTE_FALSE = ...;
}
tigrannajaryan commented 1 year ago

Structurally, I am happy with this change.

+1.

making the new flags into context flags whether they are part of the span or the link, including:

Span.flags -> Span.context_flags Link.flags -> Link.context_flags

This will limit the bit field to be useable only for context-related information. What if we need to record something about the Span that has nothing to do with the context? The name also deviates from what we do for logs where it is called flags.

I would prefer to stay consistent with logs, keep the current name and make it useful for other future bitfields.

As for the the enum name suggestion I don't mind using the term "context reference" if that's what we prefer. So perhaps we can do this:

enum SpanFlagsContextReferenceIsRemote {
   SPAN_FLAGS_CONTEXT_REFERENCE_IS_REMOTE_UNKNOWN = 0;
   SPAN_FLAGS_CONTEXT_REFERENCE_IS_REMOTE_TRUE = ...;
   SPAN_FLAGS_CONTEXT_REFERENCE_IS_REMOTE_FALSE = ...;
}

And also rename SPAN_FLAGS_PARENT_OR_LINK_IS_REMOTE_MASK to SPAN_FLAGS_CONTEXT_REFERENCE_IS_REMOTE_MASK.

If that's too long we can shorten CONTEXT_REFERENCE to CONTEXT_REF or simply to CONTEXT.

jmacd commented 12 months ago

@tigrannajaryan thank you. I agree with your proposal, to keep the name "flags" and for names like SPAN_FLAGS_CONTEXT_REFERENCE_IS_REMOTE_MASK.

estolfo commented 11 months ago

@jmacd I was about to accept the suggestions but I don't see anywhere in the thread that we decided on context instead of context_reference. Was that decided in the spec meeting this week?

tigrannajaryan commented 11 months ago

@jmacd I was about to accept the suggestions but I don't see anywhere in the thread that we decided on context instead of context_reference. Was that decided in the spec meeting this week?

I was out last spec SIG meeting. Unless an explicit decision was made in the spec meeting, let's go with context.

jmacd commented 11 months ago

@estolfo and @tigrannajaryan The meeting had light attendance, and no explicit decisions were made. I was suggesting my preference, which is that we keep the names short. Thank you @estolfo for following up.

estolfo commented 11 months ago

@jmacd and @tigrannajaryan I've completed all the changes based on the latest decisions. Please take a look when you have a chance, thanks!

tigrannajaryan commented 11 months ago

@jack-berg @jsuereth @Oberon00 please take another look, this changed since you approved the first time.

tigrannajaryan commented 10 months ago

@bogdandrutu PTAL.

estolfo commented 10 months ago

@bogdandrutu can you please take a look at this so we can move it forward? Thanks

AlexanderWert commented 8 months ago

@open-telemetry/specs-approvers I think this is ready to be merged!

tigrannajaryan commented 8 months ago

@open-telemetry/specs-approvers I think this is ready to be merged!

Needs at least 2 business days since last change before merging.

estolfo commented 8 months ago

@open-telemetry/specs-approvers can we merge this please?

tigrannajaryan commented 8 months ago

@estolfo looks like we forgot one more thing: an entry in the CHANGELOG in the Unreleased section. Please add and we can merge.

estolfo commented 8 months ago

@tigrannajaryan ok done

tigrannajaryan commented 8 months ago

Thank you @estolfo and everyone for patience!