uber / cadence

Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.
https://cadenceworkflow.io
MIT License
8.2k stars 791 forks source link

[Interpreter]Proposal]New way to write workflow using Cadence #4785

Open longquanzheng opened 2 years ago

longquanzheng commented 2 years ago

https://docs.google.com/document/d/1BpJuHf67ibaOWmN_uWw_pbrBVyb6U1PILXyzohxA5Ms

See also https://github.com/longquanzheng/iwf

slater-ben commented 2 years ago

Hi Long, can you give a pseudcode example of what a basic workflow implemented with this approach would look like?

longquanzheng commented 2 years ago

@slater-ben See this project https://github.com/longquanzheng/iwf and prototype example as a translation of the Example Subscription workflow

slater-ben commented 2 years ago

Hi @longquanzheng a few thoughts I had on this:

longquanzheng commented 2 years ago

Hey @slater-ben Super appreciate your comments and your time looking into my in-matured idea.

slater-ben commented 2 years ago

Makes sense, and you definitely have a lot more experience building real world workflows than me to understand the problem you are aiming to address. I think if this is an additional option rather than a replacement then it can really only be a good thing.

I hadn't picked up before that this completely removes the need for determinism in the workflow code. That certainly would make it easier to avoid tripping over hard to understand restrictions. I wonder however if it makes it easier to get working code at the expense of making it more likely to introduce subtle business logic bugs that are harder to track down? (Genuinely wondering - it feels to me like it probably does but I can't exactly pin down how.)

longquanzheng commented 2 years ago

@slater-ben to be more readable, I refactor the code a little bit, and also for this state machine fashion, we should be able to build a tool to generate the flow diagram like this: https://github.com/longquanzheng/iwf/tree/main/src/com/indeed/iwf/demo/subscription

slater-ben commented 2 years ago

I like the diagram and I definitely think the class per state is more readable than your previous example. One thing I can work out looking at the sample is how you specify what the starting state is? Is it just because it's the first state in the workflow getState() list or would it be specified when invoking the workflow?

longquanzheng commented 2 years ago

@slater-ben good question. It's TBD in this prototype. There are different ways to do it including what you mentioned.

I would like to hear from you and other people to see which way they would love the most.

Initially I thought we can allow letting users to start from any state, to be super flexible. But this could cause confusion.

So we could implicitly make the first state in the list as starting state, or we can add use an annotation to mark certain states as starting states.

I think allowing multiple states to be starting would be nice. This will make things a lot easier to change the workflow. E.g. ppl run into lots of problem when they wanted to change the Cadence workflow method to add a new input parameter. It's not allowed in Cadence because of backward compatibility.

slater-ben commented 2 years ago

I agree with your thought that allowing multiple, but restricted, starting states feels like the right approach. My inclination would be to have the allowed starting states defined in the workflow class (as I think you could potentially reuse state classes across multiple workflows and allowed starting states is logically a property of the workflow). I can think of a couple of alternative approaches for this:

Another thought I had looking at this is I wonder if what you're calling states would better called steps? Looking through your example code, I was confused about how decide() could return multiple potentially states (given I'd normally only expect one state at a time) but I think it's actually returning next steps with the step being complete once the prepare() conditions are met. If you changed to "step" terminology then you might also change prepare() to execute() (as I found it a bit weird when I thought about it that prepare() was where the actual activities get kicked off)

longquanzheng commented 2 years ago

a getAllowedStartingStates() method that returns a list of states yeah this should work and a workflow.start() method I am not sure I follow this idea.

Or I think a better way is to define if it's startable with the states, like below:

        return Arrays.asList(
                new StateDef( new WelcomeEmailState(), true),
                new StateDef( new CancelSubscriptionState(), false), 
                new StateDef( new WaitForPeriodState(), false), 
                new StateDef( new ChargeCurrentPeriodState(), false), 
                new StateDef( new SubscriptionOverState(), false), 
                new StateDef( new UpdateChargeAmountState(), false)
        );

Having them in the same place when defining states make it clear and easy to lookup.

longquanzheng commented 2 years ago

About steps vs states --

I do see that some people may get confused if they thought a workflow can only have one state at any time. Using steps can avoid this confusion. However, state is also standard term for this purpose, AWS step function and Cadence SDK all call it State and a workflow execution can have multiple in-parallel states.

So I want to reuse the same term and I think people will get used to it.

longquanzheng commented 2 years ago

about execute vs prepare --

I think execute may only fit for activity case. The other two condition types -- signal and timers are not quite fit as saying "executing a timer" or "executing a signal". So it's better to call it "requesting activity/timer/signal as conditions".

And in fact, the activity is not really executed here. It's executed in activity worker. WF is only scheduling it and waiting for the results. Though I agreed saying "executing activity"is also correct.

slater-ben commented 2 years ago

Those are all valid and logical reasons and I think the naming you've done will work perfectly well. Thinking about it a bit more though I realised the root of all three suggestions I made is that the point of a workflow is to do stuff (or at least cause stuff to be done) and there is nothing with a name that implies that it is doing things (with the possible exception of "prepare" although even that sounds more like getting ready to do something). I think in trying to understand the code I was looking for something that was clearly the doing stage to anchor my understanding. (For example, look at the AWS step function doco you fairly quickly see there is a State type of Task.)

So, up to you if you can find a way to accomodate that but thought it was worth giving you perspective as someone coming fresh to the code and trying to understand it.

longquanzheng commented 2 years ago

Yeah I agreed that “prepare” is not perfect. Though I feel like execute is also sorta misleading.

I will think more about it and open to more ideas from you and anyone.

I’m now thinking about “command” or “conduct”?

longquanzheng commented 2 years ago

Just got an idea that your suggestion of “execute “ will work perfectly if I changed “condition” to “command”.

The method will execute some commands of activity, signal or timers — that sounds fluent enough 😃 @slater-ben

Just changed it to execute. Super appreciate your input and help.

slater-ben commented 2 years ago

I think that works well. Feels to me like the naming now fits well with what is actually going on.

slater-ben commented 2 years ago

Next topic: it feels a bit weird to me that the State IDs are defined in the workflow rather than the State classes. I can see why you've done it that way but I think that will be a pain if you are reusing State classes across multiple workflow definitions (which feels like it could be useful).

Related, could it be possible/useful to have multiple State IDs map to one implementation class? I think that could make it really easy to use the model to write a DSL interpreter without having to do code generation.

longquanzheng commented 2 years ago

it feels a bit weird to me that the State IDs are defined in the workflow rather than the State classes

Yeah state IDs don't have to be in workflow classes. In fact the sdk doesn't require it, I put them into the wf classes as public class constants and referred in state classes. I think putting them into state classes makes sense. And it makes wf code clean and tidy.

There was only one reason that I put them in the WF class -- to avoid potentially stateID collision so putting them into one class makes it easy to find. However, this could be discovered during the init of the WF class so I think it's fine to not worry about this.

Just updated the prototype.

could it be possible/useful to have multiple State IDs map to one implementation class

Not sure I fully follow this. do you mean multiple stateIDs map to one state class? Why is that needed?

longquanzheng commented 1 year ago

Some updates Here is the slides of a comparison

A design doc.

We have implemented the first version with Temporal: https://github.com/cadence-oss/iwf

And a Java SDK: It could be easily extended to support Cadence and any other languages as the API schema is very lightweight

longquanzheng commented 1 year ago

implemented in https://github.com/indeedeng/iwf with Cadence