jhclark / ducttape

A workflow management system for researchers who heart Unix.
http://jhclark.github.com/ducttape
Other
118 stars 14 forks source link

Switch-case statement for branching #44

Open jhclark opened 12 years ago

jhclark commented 12 years ago

The following is proposed syntax for switch-case statements in ducttape:

It allows for pattern matching on branch points that have already been previously defined by some upstream task.

switch switch_task_name on WhichThing < in=$out@prev_task > out {
  # Handle a special case (e.g. segment Japanese)
  case thing_one : juman {
    echo "hello $in"
  }
  # Can handle multiple branches at once (e.g. Segment various Arabic dialects)
  case thing_two, thing_three : ar_seg < ar_model=/path {
    echo $hello $in
  }
  # Handle all other cases not previously mentioned (e.g. tokenize Western languages)
  default : moses {
    echo "$hello $in"
  }
}
jhclark commented 12 years ago

case/default blocks may not introduce additional outputs since each task must have a single, unique set of outputs.

jhclark commented 12 years ago

A variant on Lane's proposal for multiple branch point matching:

switch task_name {
  case (X: x1 x2) * (Y: y1) < in {
    bash
  }
}
jhclark commented 12 years ago

Use case: How would we allow multiple Chinese segmenters iff we case match on the Chinese language?

dowobeha commented 12 years ago

What about this?

switch switch_task_name on WhichThing < in=$out@prev_task > out {
  # Handle a special case (e.g. segment Japanese)
  case thing_one : juman {
    echo "hello $in"
  }
  # Can handle multiple branches at once (e.g. Segment various Arabic dialects)
  case thing_two, thing_three switch arabic_parser on AR_parser < ar_model=/path {
        case ar_seg {
            echo $hello $in
        }
        case other_ar_seg {
            echo $hello $in
        }
  }
  # Handle all other cases not previously mentioned (e.g. tokenize Western languages)
  default : moses {
    echo "$hello $in"
  }
}
jhclark commented 12 years ago

Potential solution for having multiple Chinese segmenters:

switch switch_task_name on WhichThing < in=$out@prev_task > out {
  # Handle a special case (e.g. segment Japanese)
  case thing_one : juman {
    echo "hello $in"
  }
  # Can handle multiple branches at once (e.g. Segment various Arabic dialects)
  case thing_two, thing_three : ar_seg < ar_model=/path {
    echo $hello $in
  }
  # Try multiple segmenters, but only for Chinese
  case zh => branchpoint WhichSeg {
    branch zh_seg : zhseg {
      $zhseg
    }
    branch cool_seg : coolseg {
      $coolseg
    }
  }
  # Handle all other cases not previously mentioned (e.g. tokenize Western languages)
  default : moses {
    echo "$hello $in"
  }
}
jhclark commented 12 years ago

Lane had suggested combining switch-case (requires branch point to already be defined) with the "branchpoint" keyword (introduces a new branch point). We can still handle the use case of "try several segmenters if the language is Chinese" if we take that approach:

switch tokenize < in > out {
  case (Lang: zh) * (Segmenter: stanford) : stanford_seg {
    $stanford_seg < $in > $out
  }
  case (Lang: zh) * (Segmenter: berkeley) : berkeley_seg {
    $berkeley_seg < $in > $out
  }
  default : moses {
    $moses/tokenizer.pl < in > out
  }
}

Optionally, we could allow a special character before the branch point name if we want to require the user to explicitly say when they want to add a new branch point instead of match an existing one:

switch tokenize < in > out {
  case (Lang: zh) * (+Segmenter: stanford) : stanford_seg {
    $stanford_seg < $in > $out
  }
  case (Lang: zh) * (+Segmenter: berkeley) : berkeley_seg {
    $berkeley_seg < $in > $out
  }
  default : moses {
    $moses/tokenizer.pl < in > out
  }
}
jhclark commented 11 years ago

This would involve changes to the AST parser and the WorkflowBuilder.