terrier-org / pyterrier

A Python framework for performing information retrieval experiments, building on http://terrier.org/
https://pyterrier.readthedocs.io/
Mozilla Public License 2.0
420 stars 65 forks source link

Transformer.compile improvements #480

Open seanmacavaney opened 2 months ago

seanmacavaney commented 2 months ago

wip

fixes #451

cmacdonald commented 2 months ago

The following is an extract of a conversation between Sean and Craig

what about transformer reuse? br >> ( (a >> b) + (a >> c) ) could be rewritten as br >> a >> (b+c)

That seems like an easy rule to add to CombSumTransformer's transformer (as long as we first convince ourselves that it's universally applicable):

class CombSumTransformer:
  def compile(self):
    if isinstance(self.left, ComposedPipeline) and len(self.left) == 2 and isinstance(self.right, ComposedPipeline) and len(self.right) == 2 and self.left[0] == self.right[0]:
      return self.left[0] >> (self.left[1] + self.right[1])

In fact, it could be pretty easily generalized to any length prefix, which is impossible to express with matchpy afaik. Ie

br >> ((a>>b>>c)+(a>>b>>d)) -> br >> a >> b >> (c+d)

class CombSumTransformer:
  def compile(self):
    if isinstance(self.left, ComposedPipeline) and isinstance(self.right, ComposedPipeline) and len(self.left) == len(self.right):
      prefix = takewhile(zip(self.left, self.right), lambda x: x[0] == x[1])
      if len(prefix) == len(self.left) - 1:
        return ComposedPipeline(prefix) >> (self.left[-1], self.right[-1])