temporalio / sdk-go

Temporal Go SDK
https://docs.temporal.io/application-development?lang=go
MIT License
481 stars 197 forks source link

Go SDK 1.26.0+ cannot deserialize `go.temporal.io/api` proto messages originally serialized as `gogoproto` messages #1525

Closed Quinn-With-Two-Ns closed 5 days ago

Quinn-With-Two-Ns commented 2 weeks ago

Users using our api protos may hit issues upgrading to 1.26.1 since our proto message will change type from gogoproto.message to proto.Message. For example, if ListWorkflowExecutionsResponse  is the response to an activity and it was serialized with gogoproto  as it was a gogoproto.message, by updating go.temporal.io  ListWorkflowExecutionsResponse would now be a normal proto.Message  so it would not use the gogoproto deserializer and fail.

cretz commented 2 weeks ago

:+1: I wonder if there is somewhere we can discourage assuming Temporal models have stable JSON

afallah-sc commented 2 weeks ago

It's not about stable JSON, if you query a workflow, this results in non-determinism. This is not regarding replay tests from serialized JSON histories.

Quinn-With-Two-Ns commented 2 weeks ago

The non determinism is because the payload in history was JSON encoded (specifically proto/json) and that encoding changed due to moving off gogoproto

afallah-sc commented 2 weeks ago

I understand, my point was more in regards to @cretz and to "discourage assuming Temporal models have stable JSON".

If an end-user is storing and serializing the state (like in a replay test), I think it absolutely makes sense to make that opaque to the end-user and tell them they cannot depend on stable JSON state...that upgrades might break things...or require changes to your tests.

However, if the actual history itself is serialized as JSON and Temporal uses that itself during queries, replay, etc...it has to either be stable or the team should make it so that changes are backwards and forwards compatible. Breaking this compatibility between different versions for serialized state, that is used internally, not just externally is dangerous.

afallah-sc commented 2 weeks ago

I may be particularly sensitive to breaking changes in clients, but we support about 50 different teams...and there will be a drift in client versions over time. While there may be breaking changes between those clients, something as fundamental as the history itself...even if it is not stable...should be backwards compatible. If changing proto serializers without any underlying schema changes breaks clients, then that is an impl detail that leaks to clients that really should not.

afallah-sc commented 2 weeks ago

It's one thing to say we added this field in Temporal, it is fundamental to X, Y, Z feature and we consider it a core part of our offering, as a result...we added the following enum, attribute, or field...and as a result, the history will change and not be backwards compatible.

It is another to say, we moved from one proto serializer to another, the enum serialization and casing is different. As a result, it will break. History is a core, central, critical component of Temporal. If it is unstable, it should be backwards compatible unless it is intentional.

I understand this is a likely bug, no one was deliberately breaking anyone. From a end-user standpoint, it shouldn't matter if the internal representations of histories (JSON or otherwise) are stable as long as the impl detail does not leak.

cretz commented 2 weeks ago

if the actual history itself is serialized as JSON and Temporal uses that itself during queries, replay, etc

This is why where history is used in JSON form we had to make special code so that it could work with old-form JSON and new-form JSON. So history JSON is compatible in both ways with the history JSON readers because that's the only place we expect stable JSON for Temporal models. We ensure history JSON format stability with the readers. Other use of Temporal models as JSON we did not expect to have to be stable. In fact, their JSON format violated proto specification. ListWorkflowExecutionsResponse is unrelated to history.

Having said that, the converter can be altered to use our accept-both-forms proto marshaler. That's probably the easiest solution for those using Temporal API models in their params/returns.