Dspy optimizer investigation

haidark commented 2 months ago

@HishamYahya to experiment with dspy optimization using the chess game to insights on the best configuration for automatic prompt optimization.

In particular, we need to understand:

best choices for the dspy optimizer
number of examples to use
how examples should be built and provided to each player
the effect different lm's used in each optimizer

Also useful would be config-based customization of the dspy optimizer with sensible configurations exposed.

HishamYahya commented 2 months ago

I'm thinking we could abstract the optimization process into the Player class and add 4 more registries:

Module Registry: initially registers all default modules in DSPy but allows the user to register their own modules to use
Optimizer Registry: registers all DSPy optimizer classes
Dataset Registry: add a new abstract class Dataset that must have an iterator of dspy.Example objects
Metric Registry: the metric to use for optimization, will initially contain the default metrics in DSPy. This is where I will register the stockfish scoring implementation

The idea here is that optimization is now done the same way for all Player objects, and the user only has to write the logic for processing the module's output to make the next move (make_move implementation), the data iterator, and whatever custom metric they want to use for the optimization signal. I think this modularizes the process quite nicely. Thoughts?

haidark commented 2 months ago

Moving our conversation here (this is me):

I think the optimizer and module registries make alot of sense. Not sure how the dataset and metric registries will help with the different games, from game to game the metric and dataset will be very different.

HishamYahya commented 2 months ago

I've just opened a PR (#11) that implements these 4 registries. Take a look at the chess.yaml config for a working example and let me know if you have any thoughts

HishamYahya commented 2 months ago

Responding to @haidark's comments about the Module registry on the PR here.

https://github.com/haidark/ZeroSumEval/pull/11#discussion_r1677278825

why do we need to specify this module in the config? I don't think modules for each game should be registered.

The idea behind a registry for modules is that prompting strategies a lot of times would be shared from one game to another. I think in the majority of cases there won't be a need for the user to define their own modules for their games, they will use the tried-and-tested prompting strategies that DSPy already implements out of the box. For instance, if you wanted to use ChainOfThought, you would just define it in the YAML config like so:

module: ChainOfThought
module_args:
  signature: board_state, role, history -> next_move

In cases where the user needs to add some extra logic in the prompting flow, they're given the option to implement it and register it within the framework without modifying any part of the codebase. In our case, we only needed to add some custom assertions, which is why we needed to implement ChessCoT to make sure the rules are followed.

In the future, we could even abstract away the assertions as well and have the user just write validator functions on the output of the module if they want. For example for the chess case, they would write a function

def validate_move(move):
  move.replace(".", "")
  try:
      board = chess.Board(board_state)
      move = board.parse_san(cot_out.move)
  except IllegalMoveError:
      return False, f"{cot_out.move} is an illegal move, choose a different move."
  except InvalidMoveError:
      return False, f"{cot_out.move} is an invalid move, choose a different move."
  except AmbiguousMoveError:
      return False, f"{cot_out.move} is an ambiguous move, choose a different move."

  return True, ""

And then it could be specified in the config in a similar way to lm-harness:

module: ChainOfThought
module_args:
  signature: board_state, role, history -> next_move
validator: !function utils.validate_move
max_retries: 10

Which I think would capture the vast majority of cases and relieve the need from the user to have a good understanding of DSPy. I think this is pretty important if we want this to reach a wider audience, DSPy's learning curve is quite steep and the more we remove that overhead the better.

haidark commented 1 month ago

Thanks Hisham, I think the module needs to be tied to the player - in essence the player is just a wrapper around the module. I don't see users mixing and matching DSPy modules through the config only (this would require in depth knowledge of DSPy and the mechanics of the game). Instead the game implementer (a developer contributor to ZSE) will provide a set of signatures and modules that the user (someone who wants to test a model/approach) can choose from. Users shouldn't be exposed to the mechanics of the game or the DSPy modules via the config for two reasons:

(importantly) if we allow arbitrary customization to the game mechanics or player modules then it will be impossible to compare the performance of models via the game.
Allowing users to configure signatures on the fly can introduce the possibility that some input that should be hidden from players is exposed.

Re: the validator functions - I like the idea, I think this should be part of the game_state, with the type of validation determined also by the player's current role. Also keep in mind there are two types of validation:

validation of game state by the game manager (this is the highest level of validation) that considers ALL game state
validation of game state by the player that considers only the parts of the game state it can access given its role

haidark / ZeroSumEval

Dspy optimizer investigation #8