mediative / eigenflow

ETL orchestration platform with recoverability and process monitoring features
https://mediative.github.io/eigenflow/
Apache License 2.0
9 stars 4 forks source link

EigenFlow should support a feature to terminate a job early #6

Closed WadeWaldron closed 8 years ago

WadeWaldron commented 8 years ago

EigenFlow should have a feature that allows a job to terminate early. This feature would allow a job to determine that it has no need to continue through the later phases (for example in a double load scenario). In this case the job should be able to specify that rather than failing we just want to terminate the run early, and then the next run can continue as normal.

suhailshergill commented 8 years ago

cross-posting exchange from slack


shergill [10:47 AM] 
i have a few thoughts. but my first thought is the fact that it tried to resume from failed state tells me that we didn’t do appropriate cleanup on encountering the double load failure encountered in the second run yesterday.(edited)

​[10:47] 
that would be the first step, imo

wade.waldron [10:48 AM] 
@shergill: The Double Load scenario occurred because we were testing the deploy of EigenFlow.  I deliberately did the double load expecting that it would encounter this scenario.

shergill [10:49 AM] 
wade.waldron: agreed. but that was yesterday. after DL was encountered, the state should have been restored to where it was before DL was encountered

​[10:49] 
so that if you tried to run the third time yesterday you’d run into DL again, but if you tried to run job today, it would go ahead

​[10:50] 
now we could do it by “resetting” state. or by tweaking the interpreter which interprets the state information regd job runs etc

wade.waldron [10:52 AM] 
@shergill: I think I follow.  You are saying that after a Double Load is encountered, something (person or machine) needs to basically reset the state to say that the double load run should be disregarded.  Or something to that effect.

​[10:53] 
Right now I am working on clearing that state manually.  However that's not a good long term solution (hence the discussion).

shergill [10:54 AM] 
wade.waldron: yes. one way to do so would be to update the log/state journal we have with a “disregard uptil” message. and then the interpreter which parses and interprets this would do the right thing (obey the directive)
WadeWaldron commented 8 years ago

Example:

val download = Downloading {
  ...
} onFailure {
 case NetworkIssue => Retry (3.seconds, 10)
 case DoubleLoad => SkipRun
}
yawaramin commented 8 years ago

How do we define a double load? Is it when we see that a job is running today and we know that the next processing date is tomorrow? In that case, shouldn't we automatically exit because we know that we're running before the next processing date?

dmitri-carpov commented 8 years ago

@yawaramin that is correct, but the system maybe forced to re-run starting from a specific day/time. In this case it will ignore the fact that it already ran that period. When a process has a double load protection the process may fail and the problem is that it will stuck there until forced to process another day manually.