Decide if AgentDescription must be included in every StatusReport

tigrannajaryan commented 2 years ago

See the comment here: https://github.com/open-telemetry/opamp-go/pull/63#issuecomment-1098526891

The spec says:

The Agent MUST send a status report:
* First time immediately after connecting to the Server. The status report MUST
  be the first message sent by the Agent.
* Subsequently every time the status of the Agent changes.

Nowhere does it says that the AgentDescription must be also included in the status report. On the contrary, the spec says:

<h4 id="agent_description">agent_description</h4>

The description of the agent, its type, where it runs, etc. See
[AgentDescription](#agentdescription-message) message for details.

This field SHOULD be unset if no Agent description fields have changed since the
last StatusReport was sent.

The question is if AgentDescription must be included in every StatusReport. We need to consider what happens both for WebSocket and plain HTTP transports.

tigrannajaryan commented 2 years ago

Some thoughts on this topic:

The Server typically keeps a state per each Agent. When the state changes on the Agent side it needs to be synced from the Agent to the Server.
OpAMP allows the Agent to omit unchanged data in the AgentToServer message, since otherwise every AgentToServer can be potentially very large.
For a variety of reasons we cannot expect that the Server always has the latest up-to-date state of each Agent (e.g. the Server may keep the state in RAM and a Server restart may loose the state).

WebSocket specifics:

WebSocket connections are persistent. We can assume that if the connection is open and maintained then all AgentToServer messages sent via that connection were processed by the Server and the cumulative state described by all those AgentToServer messages is correctly stored by the Server.
If the Server loses the state of the Agent it MUST NOT keep the WebSocket connection open, so that the assertion in point 1 above is valid.

Plain HTTP specifics:

The connections are not persistent (they may be kept alive, but that is unrelated to this discussion).
We cannot make assumptions about how long the Server keeps the Agent's state and whether the state is lost between AgentToServer messages sent by this Agent using HTTP requests.

Give the above, I believe the following is necessary:

For plain HTTP transport every AgentToServer message must either include the full state of the Agent or it must include a hash of that state (the state data may be broken down into smaller parts, and a hash of each part may be included).
We could have a different rule for WebSocket connections because of their persistent nature, however for simplicity of specification and implementation I suggest that we just follow a uniform approach for plain HTTP and for WebSocket and always include either the full state of the Agent or the hash of the state.

Upon receiving AgentToServer message one of the following can happen:

The message includes the full Agent state (and not just the hash of the state). In this case the Server updates the Agent's state it has.
The message includes only the hash. In this case the Server can compare the hash (or hashes) in the message to the one it has stored for the Agent and if there is any difference the Server will know that the Agent's state is changed. The Server then must respond with a request for the Agent to send the full state using ServerToAgent.flags field.

More specifically:

The Agent's state data is composed of the following 4 parts: AgentDescription, EffectiveConfig, RemoteConfigStatus, AgentAddonStatuses.
AgentDescription and RemoteConfigStatus are small and are always included in every AgentToServer message.
EffectiveConfig and AgentAddonStatuses state can be potentially large and have a hash field that may be included instead of the full state. When the full state is not not included in AgentToServer message and the Server doesn't have the state it can request the state using ReportEffectiveConfig and ReportAddonStatus flags.

Alternatively, we just make everything uniform and all 4 parts keep hashes and have 4 flags for the Server to request the them.

@open-telemetry/opamp-spec-approvers @andykellr @dsvanlani what do you think?

andykellr commented 2 years ago

I think your summary makes sense. In general, either the state needs to be supplied with every message or a hash needs to be supplied and the server needs a flag to be able to request the entire state.

At the protocol level, I wouldn't be opposed to some flexibility in the agent implementation. There could be a flag for each of the 4 parts and for each of the 4 components we could allow the agent to either send the entire state with each message or to send a hash.

In the reference implementation it would send the entire AgentDescription and RemoteConfigStatus and hashes of the EffectiveConfig and AgentAddonStatuses.

andykellr commented 2 years ago

After thinking about this some more, I think having all 4 flags is useful, even if most implementations pass smaller components on every request.

open-telemetry / opamp-spec

Decide if AgentDescription must be included in every StatusReport #76