open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
218 stars 142 forks source link

New `webengine` attributes to cover common IIS constructs? Or custom IIS attributes? #939

Open julealgon opened 2 months ago

julealgon commented 2 months ago

Area(s)

area:webengine

Is your change request related to a problem? Please describe.

We are currently in the process of creating our own custom implementation for an "IIS Resource Detector" in .NET and ran into the current webengine semantic conventions. We noticed some concepts that exist in IIS (and that might have analogues in other engines) are not currently captured in those conventions, in particular the "application pool" abstraction.

Describe the solution you'd like

The current attributes for the web.engine are only informational very high level metadata: the name of the engine, version, and description.

I'd imagine webengines other than IIS also have similar concepts as IIS' Application Pool, which provide a level of isolation inside of the engine to run a process or a set of processes.

My question then would be if the team has considered adding such isolation elements as native attributes in the webengine resource, or if clients should just use custom attributes for their own engine.

Describe alternatives you've considered

I've considered the following possibilities to store IIS' AppPool information:

Considering how some of the vendor-specific attributes work in other resources (like cloud stuff), perhaps the third option would be the more appropriate, in case the attribute cannot be generalized as part of webengine?

There is also the question of apppool vs the more verbose (and probably more correct here), application_pool.

I just want to make sure that our implementation follows the standard as closely as possible and that in case custom attributes are needed, that they as well are as close to the standard as they can be.

Additional context

This issue is based on my proposal here:

gregkalapos commented 2 months ago

Application pools and similar concepts ultimately come down to process management. IIS starts and manages the w3wp worker processes and with application pools you can route requests to specific worker processes and also configure those processes (e.g. assign user account, change .NET version, change bitness for a given website).

So I wonder, if this is maybe better served on the process level - for Process, there are already attributes where this info could be stored. E.g. from what I know the command line arguments contain the application pool name, which could be stored in the process.command_args attribute, the user that runs the app pool would go into process.owner, process.runtime.* could store .NET version. Bitness would be in my opinion also useful to know for an Application pool (to e.g. spot 32 vs 64 bit bugs), from what I see, there is no field for such thing on Process, but I believe it could be easily added.

Reason I try to explore if Process better covers this is that maybe adding something to webengine would just have an abstraction over processes that may store redundant information and is hard to map to multiple web engines with the same semantic.

E.g. php-fpm does something which would fall into this category, but from what I know it doesn't really have a name for the worker processes - it just manages OS processes (which are covered by the Process resource attributes). But even if I'm wrong, I'm sure there are web engines out there that just manage OS processes, and don't really have dedicated names, or ids for those sub-processes like IIS does for application pools.

If we go the other direction and still add this, then a nit on naming: I'd call it just pool (and drop application), since it's more general. E.g. php-fpm also calls this just just pool.

I think where adding an iis namespace would be useful is for metrics, especially IIS app pool related metrics. Some things that could go there inspired from here:

But this issue is about resource detector (and I feel resource attributes and not metrics), so that list won't really help for this use-case.

julealgon commented 2 months ago

Application pools and similar concepts ultimately come down to process management. IIS starts and manages the w3wp worker processes and with application pools you can route requests to specific worker processes and also configure those processes (e.g. assign user account, change .NET version, change bitness for a given website).

So I wonder, if this is maybe better served on the process level

Interesting thought. I think I'm with you here.

... for Process, there are already attributes where this info could be stored. E.g. from what I know the command line arguments contain the application pool name, which could be stored in the process.command_args attribute

This I did not know: I was picking the application pool name from the environment variable instead. If it's always present as a cmdline arg then yeah, that could be captured there indeed, but it should still appear somewhere else I think since the cmdline arg field would not have any semantics attached to it and would be hard to filter by just a single argument in observability tools etc.

... the user that runs the app pool would go into process.owner,

I believe this is covered already by the existing Process resource detector on the .NET repo.

... process.runtime.* could store .NET version.

This one is also already covered by the existing ProcessRuntime resource detector.

Bitness would be in my opinion also useful to know for an Application pool (to e.g. spot 32 vs 64 bit bugs), from what I see, there is no field for such thing on Process, but I believe it could be easily added.

Good point, agree it would make a ton of sense to store this as part of the process itself. It is even a bit surprising it is not there already as a bitness attribute.

Reason I try to explore if Process better covers this is that maybe adding something to webengine would just have an abstraction over processes that may store redundant information and is hard to map to multiple web engines with the same semantic.

I assume you are not challenging the entire notion of having a webengine namespace, and just talking about the attributes that might make more sense in, say, process here? Because I think webengine has its value to store the information it currently captures today: details about the webengine itself, like "what webengine it is" (IIS in this case) "IIS Version", etc. Those cannot be captured anywhere else I think.

E.g. php-fpm does something which would fall into this category, but from what I know it doesn't really have a name for the worker processes - it just manages OS processes (which are covered by the Process resource attributes). But even if I'm wrong, I'm sure there are web engines out there that just manage OS processes, and don't really have dedicated names, or ids for those sub-processes like IIS does for application pools.

If we go the other direction and still add this, then a nit on naming: I'd call it just pool (and drop application), since it's more general. E.g. php-fpm also calls this just just pool.

Where would you add this one though? Would you add a custom attribute on the process namespace for it? Maybe process.pool or process.pool_name? I'm not a big fan of that because "pool" is not an existing concept for processes, it's a thing that the webengine "adds" to it conceptually. If it's a webengine abstraction over processes, it should live in the webengine namespace IMHO.

My next question would then be what happens if we want to capture other information from the IIS' application pool, such as the Managed Pipeline Mode (Integrated vs Classic), or even stuff like Queue Length? Those seem very specific to IIS and are not really directly associatable to process, at least not in a generic manner. Would attributes such as this warrant a potential custom iis namespace, then?

I do have to say I like your idea of using process for most of this though, thanks for bringing that up.

gregkalapos commented 2 months ago

If it's always present as a cmdline arg then yeah, that could be captured there indeed,

Yes, before we conclude that, it should be researched if this is 100% reliable. To me it seems it is, but haven't looked into it deep enough to be totally sure.

but it should still appear somewhere else I think since the cmdline arg field would not have any semantics attached to it and would be hard to filter by just a single argument in observability tools etc.

That's true - I think this is a trade-off. If we still focus on webengine.apppoolname (or similar on webengine) then it's about this: is it worth adding something that may be redundant, but in case of IIS app pool name, it identifies the app pool clearly, or is it ok to rely on the cmd args?

I assume you are not challenging the entire notion of having a webengine namespace, and just talking about the attributes that might make more sense in, say, process here?

Correct. Specifically my point is that with the current general attributes the pool management feature of web servers may be better served by utilizing the process.* namespace instead of putting it on webengine.*. But you also had a 3. proposal of iis.apppool.id - that direction is ok to me - more on that below.

Where would you add this one though? Would you add a custom attribute on the process namespace for it? Maybe process.pool or process.pool_name? I'm not a big fan of that because "pool" is not an existing concept for processes, it's a thing that the webengine "adds" to it conceptually. If it's a webengine abstraction over processes, it should live in the webengine namespace IMHO.

Oh yeah, maybe I wasn't clear. So what I suggested is, if this goes to webengine.*, then I'd call it webengine.pool. Definitely not on process.*. The name pool is really just a nitpick - I feel that's general, and application pool is too IIS specific.

My next question would then be what happens if we want to capture other information from the IIS' application pool, such as the Managed Pipeline Mode (Integrated vs Classic), or even stuff like Queue Length? Those seem very specific to IIS and are not really directly associatable to process, at least not in a generic manner. Would attributes such as this warrant a potential custom iis namespace, then?

Yes, I think those should definitely go into an iis namespace - similarly to the metric attributes I listed. And if there is an iis namespace, then naturally we could also add one attribute for the app pool name. So once this goes beyond the app pool name, I think a dedicated namespace is fine and if there is a dedicated namespace, then having iis.apppool.id (or similar) is better than relying on process.command_args.

After looking into this a bit I believe it's not possible to generalize the name of the app pool on webengine without introducing ambiguity with the process.* attributes listed above - in case of IIS it would be ok, but in case of other web engines I think it's not. But once there is an IIS specific namespace, then storing the app pool name there is fine in my opinion.

julealgon commented 2 months ago

If it's always present as a cmdline arg then yeah, that could be captured there indeed,

Yes, before we conclude that, it should be researched if this is 100% reliable. To me it seems it is, but haven't looked into it deep enough to be totally sure.

but it should still appear somewhere else I think since the cmdline arg field would not have any semantics attached to it and would be hard to filter by just a single argument in observability tools etc.

That's true - I think this is a trade-off. If we still focus on webengine.apppoolname (or similar on webengine) then it's about this: is it worth adding something that may be redundant, but in case of IIS app pool name, it identifies the app pool clearly, or is it ok to rely on the cmd args?

...

After looking into this a bit I believe it's not possible to generalize the name of the app pool on webengine without introducing ambiguity with the process.* attributes listed above - in case of IIS it would be ok, but in case of other web engines I think it's not. But once there is an IIS specific namespace, then storing the app pool name there is fine in my opinion.

Just to make my own position on this one clear, I don't think one should preclude the other. It will eventually be very sensible for the process' command line to always be captured (if one opts for using the ProcessResourceDetector, that is), and at the same time consider the IIS app pool as its own separate thing: all that to say I believe we should have both in place. The cmdline args as a generic information just for double checking, and the separate, more semantic, IIS-specific, attribute for capturing the application pool id/name so people can perform correlations/etc more easily. The cmdline argument could be used temporarily as a workaround while the iis namespace is not defined, but it should not "go away" once it is IMHO.

This is particularly obvious in cases where we are not dealing with IIS hosting or even web projects in the first place. Cmdline arg capturing should not be tied at all to IIS properties since we still very much want the cmdline args to be captured on those other scenarios too.