[Interpreter]Proposal]New way to write workflow using Cadence

longquanzheng commented 2 years ago

https://docs.google.com/document/d/1BpJuHf67ibaOWmN_uWw_pbrBVyb6U1PILXyzohxA5Ms

slater-ben commented 2 years ago

Hi Long, can you give a pseudcode example of what a basic workflow implemented with this approach would look like?

longquanzheng commented 2 years ago

@slater-ben See this project https://github.com/longquanzheng/iwf and prototype example as a translation of the Example Subscription workflow

slater-ben commented 2 years ago

Hi @longquanzheng a few thoughts I had on this:

For me at least, it was a lot harder to follow the workflow business logic in your prototype than in the standard example. For me, making it easy to follow the business logic is one of the key benefits of Cadence. So it would seem like a hard sell to make this the primary way of developing workflows in Cadence.
I do feel like (without really having worked it through) that your new approach might be easier to deal with you if you were looking to build a DSL or visual workflow editor on top of Cadence
It did get me thinking on the original goal of making it easier to develop repayable workflows: I feel like it might be possible to develop an automated framework for testing that workflow code is deterministic by simulating activity results and other Cadence interactions (might require a few more annotations on activities to give some hints on valid values). While this might not provide a 100% guarantee of determinism it would probably help most devs in understanding the concept and provide a useful testing tool even for experienced devs.

longquanzheng commented 2 years ago

Hey @slater-ben Super appreciate your comments and your time looking into my in-matured idea.

Yes, I do see that the business flow is getting less fluent in this format compared to original code. This is the trade off that I have put as "Limitation".
I don't mean to abandon Cadence. I am just hoping this will be a good supplement to the technology.
AWS Step Function, Airflow, Netflix Conductor are all letting developers to model business into this state machine fashion. In fact, those technologies are all more widely used than Cadence in the industry. It's not perfect to model business in state machine, but based on the facts, I believe it's not a terrible idea either. With this proposal, Cadence can do things in a similar way but is way more powerful than Step Function, Air flow and Conductor etc.
I think choosing this or native Cadence can be based on use cases. This idea should be good enough for many real world use cases.
It's not only determinism. There are a lot more work than that when using Cadence workflow especially for operation & monitoring & debugging. Things are significantly improved with this proposal.
As a separate topic, yes, we should improve Cadence itself for determinism. I can talk to you offline for other ideas that I have.

slater-ben commented 2 years ago

Makes sense, and you definitely have a lot more experience building real world workflows than me to understand the problem you are aiming to address. I think if this is an additional option rather than a replacement then it can really only be a good thing.

I hadn't picked up before that this completely removes the need for determinism in the workflow code. That certainly would make it easier to avoid tripping over hard to understand restrictions. I wonder however if it makes it easier to get working code at the expense of making it more likely to introduce subtle business logic bugs that are harder to track down? (Genuinely wondering - it feels to me like it probably does but I can't exactly pin down how.)

longquanzheng commented 2 years ago

@slater-ben to be more readable, I refactor the code a little bit, and also for this state machine fashion, we should be able to build a tool to generate the flow diagram like this: https://github.com/longquanzheng/iwf/tree/main/src/com/indeed/iwf/demo/subscription

slater-ben commented 2 years ago

I like the diagram and I definitely think the class per state is more readable than your previous example. One thing I can work out looking at the sample is how you specify what the starting state is? Is it just because it's the first state in the workflow getState() list or would it be specified when invoking the workflow?

longquanzheng commented 2 years ago

@slater-ben good question. It's TBD in this prototype. There are different ways to do it including what you mentioned.

I would like to hear from you and other people to see which way they would love the most.

Initially I thought we can allow letting users to start from any state, to be super flexible. But this could cause confusion.

So we could implicitly make the first state in the list as starting state, or we can add use an annotation to mark certain states as starting states.

I think allowing multiple states to be starting would be nice. This will make things a lot easier to change the workflow. E.g. ppl run into lots of problem when they wanted to change the Cadence workflow method to add a new input parameter. It's not allowed in Cadence because of backward compatibility.

slater-ben commented 2 years ago

I agree with your thought that allowing multiple, but restricted, starting states feels like the right approach. My inclination would be to have the allowed starting states defined in the workflow class (as I think you could potentially reuse state classes across multiple workflows and allowed starting states is logically a property of the workflow). I can think of a couple of alternative approaches for this:

a getAllowedStartingStates() method that returns a list of states
a workflow.start() method (or maybe multiple annotated methods) which return a decision about the first state (I think I prefer this as it gives you an obvious place to start if you're trying to work through the logic of a workflow).

Another thought I had looking at this is I wonder if what you're calling states would better called steps? Looking through your example code, I was confused about how decide() could return multiple potentially states (given I'd normally only expect one state at a time) but I think it's actually returning next steps with the step being complete once the prepare() conditions are met. If you changed to "step" terminology then you might also change prepare() to execute() (as I found it a bit weird when I thought about it that prepare() was where the actual activities get kicked off)

longquanzheng commented 2 years ago

a getAllowedStartingStates() method that returns a list of states yeah this should work and a workflow.start() method I am not sure I follow this idea.

Or I think a better way is to define if it's startable with the states, like below:

        return Arrays.asList(
                new StateDef( new WelcomeEmailState(), true),
                new StateDef( new CancelSubscriptionState(), false), 
                new StateDef( new WaitForPeriodState(), false), 
                new StateDef( new ChargeCurrentPeriodState(), false), 
                new StateDef( new SubscriptionOverState(), false), 
                new StateDef( new UpdateChargeAmountState(), false)
        );

Having them in the same place when defining states make it clear and easy to lookup.

longquanzheng commented 2 years ago

About steps vs states --

I do see that some people may get confused if they thought a workflow can only have one state at any time. Using steps can avoid this confusion. However, state is also standard term for this purpose, AWS step function and Cadence SDK all call it State and a workflow execution can have multiple in-parallel states.

So I want to reuse the same term and I think people will get used to it.

longquanzheng commented 2 years ago

about execute vs prepare --

I think execute may only fit for activity case. The other two condition types -- signal and timers are not quite fit as saying "executing a timer" or "executing a signal". So it's better to call it "requesting activity/timer/signal as conditions".

And in fact, the activity is not really executed here. It's executed in activity worker. WF is only scheduling it and waiting for the results. Though I agreed saying "executing activity"is also correct.

slater-ben commented 2 years ago

Those are all valid and logical reasons and I think the naming you've done will work perfectly well. Thinking about it a bit more though I realised the root of all three suggestions I made is that the point of a workflow is to do stuff (or at least cause stuff to be done) and there is nothing with a name that implies that it is doing things (with the possible exception of "prepare" although even that sounds more like getting ready to do something). I think in trying to understand the code I was looking for something that was clearly the doing stage to anchor my understanding. (For example, look at the AWS step function doco you fairly quickly see there is a State type of Task.)

So, up to you if you can find a way to accomodate that but thought it was worth giving you perspective as someone coming fresh to the code and trying to understand it.

longquanzheng commented 2 years ago

Yeah I agreed that “prepare” is not perfect. Though I feel like execute is also sorta misleading.

I will think more about it and open to more ideas from you and anyone.

I’m now thinking about “command” or “conduct”?

longquanzheng commented 2 years ago

Just got an idea that your suggestion of “execute “ will work perfectly if I changed “condition” to “command”.

The method will execute some commands of activity, signal or timers — that sounds fluent enough 😃 @slater-ben

Just changed it to execute. Super appreciate your input and help.

slater-ben commented 2 years ago

I think that works well. Feels to me like the naming now fits well with what is actually going on.

slater-ben commented 2 years ago

Next topic: it feels a bit weird to me that the State IDs are defined in the workflow rather than the State classes. I can see why you've done it that way but I think that will be a pain if you are reusing State classes across multiple workflow definitions (which feels like it could be useful).

Related, could it be possible/useful to have multiple State IDs map to one implementation class? I think that could make it really easy to use the model to write a DSL interpreter without having to do code generation.

longquanzheng commented 2 years ago

it feels a bit weird to me that the State IDs are defined in the workflow rather than the State classes

Yeah state IDs don't have to be in workflow classes. In fact the sdk doesn't require it, I put them into the wf classes as public class constants and referred in state classes. I think putting them into state classes makes sense. And it makes wf code clean and tidy.

There was only one reason that I put them in the WF class -- to avoid potentially stateID collision so putting them into one class makes it easy to find. However, this could be discovered during the init of the WF class so I think it's fine to not worry about this.

Just updated the prototype.

could it be possible/useful to have multiple State IDs map to one implementation class

Not sure I fully follow this. do you mean multiple stateIDs map to one state class? Why is that needed?

longquanzheng commented 1 year ago

Some updates Here is the slides of a comparison

A design doc.

We have implemented the first version with Temporal: https://github.com/cadence-oss/iwf

And a Java SDK: It could be easily extended to support Cadence and any other languages as the API schema is very lightweight

longquanzheng commented 1 year ago

implemented in https://github.com/indeedeng/iwf with Cadence

uber / cadence

[Interpreter]Proposal]New way to write workflow using Cadence #4785