treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.3k stars 221 forks source link

for_each task cannot skip in --start. #142

Open toyama0919 opened 8 years ago

toyama0919 commented 8 years ago

Hello.

can't skip for_each task.

+one:
  for_each>:
    count: [for1, for2, for3]
  _do:
    sh>: echo ${count}

+two:
  sh>: echo two

+three:
  sh>: echo three

--start from three, but 「one」 run(two is skip). But --goal option is OK.

$ digdag run my_workflow.dig --project tmp --start '+my_workflow+three'
2016-06-23 19:49:55 +0900: Digdag v0.8.2
2016-06-23 19:49:56 +0900 [INFO] (main): Setting workdir to /project/ipros/ipros-embulk/tmp
2016-06-23 19:49:56 +0900 [WARN] (main): Reusing the last session time 2016-06-23T00:00:00+00:00.
2016-06-23 19:49:56 +0900 [INFO] (main): Using session .digdag/status/20160623T000000+0000.
2016-06-23 19:49:57 +0900 [INFO] (main): Starting a new session project id=1 workflow name=my_workflow session_time=2016-06-23T00:00:00+00:00
2016-06-23 19:49:57 +0900 [WARN] (0018@+my_workflow+one): Skipped
2016-06-23 19:49:57 +0900 [INFO] (0018@+my_workflow+one^sub+for-count=for1): sh>: echo for1
for1
2016-06-23 19:49:58 +0900 [INFO] (0018@+my_workflow+one^sub+for-count=for2): sh>: echo for2
for2
2016-06-23 19:49:58 +0900 [INFO] (0018@+my_workflow+one^sub+for-count=for3): sh>: echo for3
for3
2016-06-23 19:49:58 +0900 [WARN] (0018@+my_workflow+two): Skipped
2016-06-23 19:49:58 +0900 [INFO] (0018@+my_workflow+three): sh>: echo three
three
Success. Task state is saved at .digdag/status/20160623T000000+0000 directory.
  * Use --session <daily | hourly | "yyyy-MM-dd[ HH:mm:ss]"> to not reuse the last session time.
  * Use --rerun, --start +NAME, or --goal +NAME argument to rerun skipped tasks.
$ digdag run my_workflow.dig --project tmp -g '+my_workflow+three'
2016-06-23 19:50:05 +0900: Digdag v0.8.2
2016-06-23 19:50:06 +0900 [INFO] (main): Setting workdir to /project/ipros/ipros-embulk/tmp
2016-06-23 19:50:06 +0900 [WARN] (main): Reusing the last session time 2016-06-23T00:00:00+00:00.
2016-06-23 19:50:06 +0900 [INFO] (main): Using session .digdag/status/20160623T000000+0000.
2016-06-23 19:50:06 +0900 [INFO] (main): Starting a new session project id=1 workflow name=my_workflow session_time=2016-06-23T00:00:00+00:00
2016-06-23 19:50:07 +0900 [WARN] (0018@+my_workflow+one): Skipped
2016-06-23 19:50:07 +0900 [WARN] (0018@+my_workflow+one^sub+for-count=for1): Skipped
2016-06-23 19:50:07 +0900 [WARN] (0018@+my_workflow+one^sub+for-count=for2): Skipped
2016-06-23 19:50:07 +0900 [WARN] (0018@+my_workflow+one^sub+for-count=for3): Skipped
2016-06-23 19:50:07 +0900 [WARN] (0018@+my_workflow+two): Skipped
2016-06-23 19:50:08 +0900 [INFO] (0018@+my_workflow+three): sh>: echo three
three
Success. Task state is saved at .digdag/status/20160623T000000+0000 directory.
  * Use --session <daily | hourly | "yyyy-MM-dd[ HH:mm:ss]"> to not reuse the last session time.
  * Use --rerun, --start +NAME, or --goal +NAME argument to rerun skipped tasks.
frsyuki commented 8 years ago

this is because resumeStateFileEnabledTaskIndexList is calculated using the original workflow definition which doesn't include generated tasks...it needs a fix.