treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.3k stars 221 forks source link

task ordering is not well defined #58

Closed danielnorberg closed 8 years ago

danielnorberg commented 8 years ago

digdag tasks are defined in a key-value map and digdag executes tasks in the literal order of the text in the yaml file. This can be fragile as key-value maps in YAML do not have a well defined semantic order.

http://yaml.org/spec/1.2/spec.html#id2765608

Some examples of how this can become painful:

Suggested solution alternatives:

  1. Explicitly define digdag workflow files to not be YAML and change suffix to e.g. .digdag or .dd etc. We can still say that syntax is YAML and people can configure editors to do YAML syntax highlighting etc of these files.
  2. Change digdag workflow structure to have a well defined task order. E.g.:
tasks:
- name: task1
  sh>: echo first task
- name: task2
  td>: queries/second_task.sql
  tasks: 
    - name: subtask1
      sh>: echo sub
    - name: subtask2
      sh>: echo tasks

Some benefits of this approach are:

danielnorberg commented 8 years ago

Resolving this per https://github.com/treasure-data/digdag/issues/70 i.e. solution alternative 1.