mediative / eigenflow

ETL orchestration platform with recoverability and process monitoring features
https://mediative.github.io/eigenflow/
Apache License 2.0
9 stars 4 forks source link

`StagedProcess#nextProcessingDate` does not currently have a sensible default #36

Closed yawaramin closed 6 years ago

yawaramin commented 7 years ago

In the majority of cases, we just want it to be the start of the next day. This should save a lot of boilerplate in jobs derived from StagedProcess. One way to do this using only Java standard date/time APIs (extract from REPL):

@ import java.util._ 
import java.util._

@ new Date() 
res1: Date = Thu Dec 08 14:12:54 EST 2016

@ Calendar.getInstance 
res3: Calendar = java.util.GregorianCalendar[time=1481224422011,areFieldsSet=true,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="America/Toronto",offset=-18000000,dstSavings=3600000,useDaylight=true,transitions=231,lastRule=java.util.SimpleTimeZone[id=America/Toronto,offset=-18000000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2016,MONTH=11,WEEK_OF_YEAR=50,WEEK_OF_MONTH=2,DAY_OF_MONTH=8,DAY_OF_YEAR=343,DAY_OF_WEEK=5,DAY_OF_WEEK_IN_MONTH=2,AM_PM=1,HOUR=2,HOUR_OF_DAY=14,MINUTE=13,SECOND=42,MILLISECOND=11,ZONE_OFFSET=-18000000,DST_OFFSET=0]

@ res3 setTime res1 

@ res3.set(Calendar.AM_PM, 0) 

@ res3.set(Calendar.DST_OFFSET, 0) 

@ res3.set(Calendar.HOUR, 0) 

@ res3.set(Calendar.HOUR_OF_DAY, 0) 

@ res3.set(Calendar.MILLISECOND, 0) 

@ res3.set(Calendar.MINUTE, 0) 

@ res3.set(Calendar.MINUTE, 0) 

@ res3.set(Calendar.SECOND, 0) 

@ res3.set(Calendar.ZONE_OFFSET, 0) 

@ res3.add(Calendar.DATE, 1) 

@ res3.getTime 
res15: Date = Fri Dec 09 00:00:00 EST 2016
suhailshergill commented 7 years ago

Parametrizing the start on "size of time period" would be a good idea.

If it's daily, then next day. If it's weekly, next week. If it's monthly, next month (with proper handling of edge cases).

Looking at Google /outlook event recurrence patterns, would give you a fairly complete picture (of cases to handle) as well as possible strategies for handling said edge cases.

yawaramin commented 7 years ago

@suhailshergill good point. The current signature of nextProcessingDate is basically telling me, 'give me the next processing date based purely on the last completed date'. In the StagedProcess trait, I don't have access to any recurrence information. I may be able to stick closely to the current API and default behaviour by adding a def recurrence: scala.concurrent.duration.Duration method to the trait, e.g.

import scala.concurrent.duration.DurationInt

trait StagedProcess extends EigenflowDSL {
  def executionPlan: ExecutionPlan[_, _]

  def initialProcessingDate: Date = new Date()
  def recurrence: Duration = 1.day
  def nextProcessingDate(lastCompleted: Date): Date =
    /*
    lastCompleted rounded down to nearest recurrence unit + recurrence
    e.g. if recurrence is 1h and lastCompleted is 16:55,
    nextProcessingDate = 16:55 round down to 16:00 + 1h = 17:00
    */
}
jonas commented 7 years ago

java.time FTW if going down the sensible default route:

scala> import java.time.temporal.ChronoUnit.DAYS
import java.time.temporal.ChronoUnit.DAYS

scala> import java.util.Date
import java.util.Date

scala> val d = new Date().toInstant
d: java.time.Instant = 2016-12-09T04:18:05.066Z

scala> d -> d.plus(1, DAYS).truncatedTo(DAYS)
res8: (java.time.Instant, java.time.Instant) = (2016-12-09T04:18:05.066Z,2016-12-10T00:00:00Z)

Also to note that the recurrence is already there in the Chronos job config.

yawaramin commented 7 years ago

@jonas agreed, java.time FTW. I forgot that we have minimum JVM version at 1.8.

About the recurrence in Chronos, correct me if I'm wrong, but Eigenflow is not necessarily tied to Chronos. Although it's true that we can try to model recurrence after Chronos' recurrence capabilities to make interop with Chronos easier. I'll have to read up on that.