open-telemetry / opentelemetry-python

OpenTelemetry Python API and SDK
https://opentelemetry.io
Apache License 2.0
1.76k stars 616 forks source link

Provide versioned semantic conventions with sub-package/folder #2946

Open srikanthccv opened 2 years ago

srikanthccv commented 2 years ago

The idea is to create a new folder for each version of semantic conventions release and instrumentation packages or end users can choose which one to use.

The folder structure will look like following


opentelemetry-semantic-conventions/src/opentelemetry/semconv
  /v1_13_0
    /trace
    /resource
    /metric
    /..
  /v1_12_0
  /v1_11_0

It is convenient to use this because we don't necessarily need to update the instrumentation every time there is a spec release. This will enable us to mark the package as stable since there will always be a backward compatible sub-package but the downside is it can grow big over time. The current size of the package seems to be around 26kb https://pypi.org/project/opentelemetry-semantic-conventions/#files. Original inspiration from go SKD https://github.com/open-telemetry/opentelemetry-go/tree/main/semconv

ocelotl commented 2 years ago

I think it would be better to separate our semantic conventions code in a different repo so that we have an opentelemetry-semantic-conventions package that follows the releases of the spec because:

  1. Code would not increase since we would not have repeated code in that repo.
  2. When we want to update the semantic conventions now, we have to make a release for all our components. This way we could make a release of the semantic conventions package only making it much easier to have an up-to date semantic conventions package.
  3. If users want to use another version of X, they update the version number of X in their requirements file, they don't update application or library code. This is important because users use pip freeze to find out how was the running environment of an application, they don't expect to also have to record some application code line to find out which semantic conventions version was being used.
ocelotl commented 2 years ago

Something else:

  1. An user uses from opentelemetry.semantic_conventions.v_1_2_3 import X today with opentelemetry-semantic-conventions==5.2.3 in their requirements file.
  2. Accidentally someone commits the deletion the symbol X from opentelemetry.semantic_conventions.v_1_2_3. That commit gets merged and included in release 5.2.4 of our packages (including opentelemetry-semantic-conventions).
  3. The user updates the dependencies to include opentelemetry-semantic-conventions==5.2.4. The user still wants to use semantic conventions v1.2.3 to import X but now it is broken.

In summary, using actual released packages with their corresponding versions to protects the integrity of those versions because it is guaranteed that the code in a specific version of a package will never change. Using directories named after versions is a fragile approach.

codeboten commented 2 years ago

I like this idea of releasing opentelemetry-semantic-conventions when a new version of the spec is released. I do see some benefits to @srikanthccv's proposal in the following scenario:

I'm not sure if this is a great situation, as it may mean that with multiple semconvs being used applications will produce telemetry differently, this should be ok so long as schema URLs are set correctly.

I do struggle with the idea of having a continuously increasing package size. If this is the route the community wants to take, I'd recommend placing some kind of limit for how many versions back would be supported.

There is precedent for including multiple versions within a single package, the kubernetes client package does this (https://github.com/kubernetes-client/python/tree/master/kubernetes/client/api).

srikanthccv commented 2 years ago

I think it would be better to separate our semantic conventions code in a different repo so that we have an opentelemetry-semantic-conventions package that follows the releases of the spec because

I worry this sets an incorrect expectation that other SDK packages also follow and keep in sync with the spec release. Does any other SIG do this today?

If users want to use another version of X, they update the version number of X in their requirements file, they don't update application or library code.

This assumes that a new semantic version didn't delete anything which is not the case now. Take a look at the recent release, there are some attributes removed from the HTTP attributes which means the user might have to update the application code if they want to use the new version. In a sense, we are also the users (contrib repo), if we want to use the 1.13.0 version we must update the instrumentation library code.

Accidentally someone commits the deletion the symbol X from opentelemetry.semantic_conventions.v_1_2_3. That commit gets merged and included in release 5.2.4 of our packages (including opentelemetry-semantic-conventions).

No handmade changes will be done to this package. It's all generated from sem conv yaml files.

I do struggle with the idea of having a continuously increasing package size. If this is the route the community wants to take, I'd recommend placing some kind of limit for how many versions back would be supported.

This is a valid concern and I think the idea of limiting the number of versions is a good suggestion.

ocelotl commented 2 years ago

This assumes that a new semantic version didn't delete anything which is not the case now. Take a look at the recent release, there are some attributes removed from the HTTP attributes which means the user might have to update the application code if they want to use the new version. In a sense, we are also the users (contrib repo), if we want to use the 1.13.0 version we must update the instrumentation library code.

The application code may need or not to be updated if another version of the semantic conventions are used but that is not the point here. The point here is that when using version-named directories a dependency version is being specified outside the requirements file and into the application code.

No handmade changes will be done to this package. It's all generated from sem conv yaml files.

That does not matter, with the version-named directories approach that generated code can be modified by any commit.

This is a valid concern and I think the idea of limiting the number of versions is a good suggestion.

Limiting the number of versions is a solution that introduces another problem, an user will not be able to use an old version after it is deleted. Having a separate package for the semantic conventions does not have the problem of code growing and also it does not have the problem of having to delete old versions.

srikanthccv commented 2 years ago

Isn't the idea behind schema_url about the ability for applications to use multiple sematic conv versions? How would it be possible if there is a separate package for each version?

lzchen commented 2 years ago

@ocelotl I'm not following your suggestion here. Are you proposing to release a new differently named package per semantic convention version? i.e. opentelemetry-semantic-conventions-1.12 for v1.12 of semantic conventions and also keep it in lockstep with the spec version? How would the instrumentation hard dependencies work in this case?

@srikanthccv @ocelotl 's concern of:

Limiting the number of versions is a solution that introduces another problem, an user will not be able to use an old version after it is deleted.

makes sense. I can totally see users setting up their application once and never updating the semantic conventions again as soon as it fits their business use case. If these conventions get outdated/obsolete this could be troublesome. However, as to @codeboten 's point,

If this is the route the community wants to take, I'd recommend placing some kind of limit for how many versions back would be supported.

maybe the above problem isn't as bad as we think realistically for users? Maybe the desire to have a stamped "stable" on our product is more appealing to users than it is to update their code once in a blue moon? As long as we are transparent with our users and give them ample time/warning to upgrade (before deleting older versions of semconv), is that so bad?

ocelotl commented 2 years ago

@ocelotl I'm not following your suggestion here. Are you proposing to release a new differently named package per semantic convention version? i.e. opentelemetry-semantic-conventions-1.12 for v1.12 of semantic conventions and also keep it in lockstep with the spec version? How would the instrumentation hard dependencies work in this case?

No, I am suggesting to have our opentelemetry-semantic-conventions package follow this versioning scheme: X.Y.Z.a, where X.Y.X is the version number of the spec and a is an additional number for our bugfixes.

lzchen commented 2 years ago

@ocelotl

  1. How does that solve the original goal of being able to mark semantic conventions package stable?
  2. How does this make it so that we don't have to update our instrumentations everytime semantic conventions is updated?
  3. If users want to use another version of X, they update the version number of X in their requirements file, they don't update application or library code.

Isn't this what we already do today? How would users be able to use semantic version X without having to update their instrumentation version as well?

Maybe I'm not fully understanding. Feel free to add this topic to Thursday's agenda.

ocelotl commented 2 years ago

Maybe I'm not fully understanding. Feel free to add this topic to Thursday's agenda.

Sure, I will :+1:

ocelotl commented 2 years ago

In the SIG we discussed that the problem with the approach I suggest is that we expect that it will be frequent that an instrumentation will pin its dependency to a version of semantic conventions and another one to another version so that the user will have a problem when trying to install both in the same virtual environment (I specifically mention frequent here because the same can happen to other packages like opentelemetry-api, opentelemetry-sdk, etc. but we don't expect that to happen frequently).

There is another solution that has all the advantages of the approach I suggest without this problem, and that is to make a new package named opentelemetry-semantic-conventions-... with every spec release, where ... is some versioning schema that indicates that semantic conventions is unstable and that the package corresponds with the released version of the spec (something like 0.A.B.C.D, where A.B.C is the spec version and D a bug fix number, maybe?).

Another advantage of this approach is that if it happens that the semantic conventions in the spec someday become a stable thing that follows semantic versioning we can just start having the release of opentelemetry-semantic-conventions with a normal semantic versioning.

The disadvantage of this approach is that is horrible.

But it works. :shrug:

WDYT @srikanthccv @lzchen @aabmass ?

lancetarn commented 2 years ago

👋 New here, so feel free to ignore this... there is lots of context I don't have!

FWIW, I have used the "subpackage-versioning" strategy

from foo.v1_2_3 import bar

within private packages to cushion against version conflicts while providing backward compatibility. It isn't the strangest thing I've seen, although I agree it also isn't the most straight-forward. All told, the go team's approach linked in the issue description does seem to make sense given the forces at play.

I think creating an entire new package for every version has a few downsides:

lzchen commented 2 years ago

@ocelotl

There is another solution that has all the advantages of the approach I suggest without this problem, and that is to make a new package named opentelemetry-semantic-conventions-... with every spec release

Are you proposing to release a new differently named package per semantic convention version?

So is your new proposal aligned with my question from before?

ocelotl commented 2 years ago

wave New here, so feel free to ignore this... there is lots of context I don't have!

FWIW, I have used the "subpackage-versioning" strategy

from foo.v1_2_3 import bar

within private packages to cushion against version conflicts while providing backward compatibility. It isn't the strangest thing I've seen, although I agree it also isn't the most straight-forward. All told, the go team's approach linked in the issue description does seem to make sense given the forces at play.

I think creating an entire new package for every version has a few downsides:

  • Need to add entirely new dependency for every update

This is normal, everytime an user wants to update a library the will add a change in their requirements.txt file.

  • Still need to update library/app code if you want to move to newer version

This is true.

  • Clutters the package namespace

    • Makes finding the right package more difficult/more typo-squatting options

This is true.

  • PyPI is not designed for sorting/comparing across packages in this way/more difficult to know what "latest" is

This is true as well. I am proposing a horrible solution, indeed.

ocelotl commented 2 years ago

@ocelotl

There is another solution that has all the advantages of the approach I suggest without this problem, and that is to make a new package named opentelemetry-semantic-conventions-... with every spec release

Are you proposing to release a new differently named package per semantic convention version?

So is your new proposal aligned with my question from before?

That's right, it is.

aabmass commented 2 years ago

I do struggle with the idea of having a continuously increasing package size. If this is the route the community wants to take, I'd recommend placing some kind of limit for how many versions back would be supported.

@codeboten I agree that would be nice. I see two way to do this without breaking semantic versioning

Also re package size, I did a quick test by copying semantic conventions 200 times in src tree to get some rough estimates:

This is similar to grpcio wheel package size for comparison. But not great.

aabmass commented 1 year ago

This could be an alternate solution https://github.com/open-telemetry/semantic-conventions/issues/214 to avoid backward compatibility issues when fields are removed.

lzchen commented 1 year ago

Do we still want to explore this now that we have agreed that we will be carrying out the migration plan for OT sem conv? https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/http.md

aabmass commented 1 year ago

To refocus this conversation, the goal of this issue is to mark opentelemetry-semantic-conventions PyPI package as stable with no expectation of every having a major version bump. The benefits are: