stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
14.42k stars 1.1k forks source link

dspy.Predict should be a dspy.Module #370

Open thomasahle opened 5 months ago

thomasahle commented 5 months ago

For most purposes dspy.Predict behaves the same was as a dspy.Module. But if you try to pass a Predict directly to an optimizer, you'll notice that it's lacking a lot of (simple) methods that Module has.

Writing unittests I've often found myself writing unnecessary classes like

class SimpleModule(dspy.Module):
    def __init__(self, signature):
        super().__init__()
        self.predictor = dspy.Predict(signature)

    def forward(self, **kwargs):
        return self.predictor(**kwargs)
okhat commented 5 months ago

Yeah :/ Good point. Same for ChainOfThought.

okhat commented 5 months ago

I wonder if we can actually just resolve this by making a shallow wrapper and renaming the current thing to CorePredict and CoreChainOfThought?

jgeldart commented 5 months ago

This is less of an engineering thing and more a programming language theory thing, but I've been thinking about what category Predict, ChainofThought, etc. falls under. I think there may be a missing category in the metamodel, which I've been calling a 'strategy' in my head: a module that returns a module. Conceptually, this opens the door to strategy optimisation (optimising the module that returns the module separately to the final signature) but the main benefit for me is just allowing us to reason about higher-order functions (important with functorial things like lists). I can imagine strategies for handling mapping on lists, a tree of thought one, a graph of thought one, or even ones that add MemGPT/Self-RAG support to another strategy.

Neoxelox commented 5 months ago

As I understand DSPy, a "program" is a "module" composed of other modules, such as "Predictors" (ChainOfThought/Predict/ReAct...), "Retrievers" or other "Subprograms". But, if we are going to categorize "Predictors" differently, I think, I would call them prompting "Techniques"

thomasahle commented 5 months ago

I think it's good to just try and follow pytorch on this. There a nn.Sequence is still an nn.Module even though it takes a list of modules.

Maybe the current predict code could be moved to a function that the predict module calls? A bit like your Core Predict idea @okhat

thomasahle commented 5 months ago

Regarding ChainOfThought, it seems like we could just replace it with

class ChainOfThought(Module):
    def __init__(self, signature, rationale_type=None, **config):
        super().__init__(**config)

        signature = ensure_signature(signature)
        *_keys, last_key = signature.output_fields.keys()

        rationale_type = rationale_type or dspy.OutputField(
            prefix="Reasoning: Let's think step by step in order to",
            desc="${produce the " + last_key + "}. We ...",
        )

        self.extended_signature = signature.prepend("rationale", rationale_type, type_=str)
        self.predict = dspy.Predict(self.extended_signature)

    def forward(self, **kwargs):
        return self.predict(**kwargs)

This still passes all my tests, except those for the (bayesian) signature optimizer, which has some hacks regarding extended_signatures.

thomasahle commented 5 months ago

Or I guess a CorePredictor would be nice, as you say, since it serves as a place to "store signatures", so they can be changed, while keeping the Signature class itself immutable. E.g. in the Signature optimizer:

 # Go through our module's predictors
  for p_i, (p_old, p_new) in enumerate(zip(module.predictors(), module_clone.predictors())):
      candidates_ = latest_candidates[id(p_old)] # Use the most recently generated candidates for evaluation 
      if len(module.predictors()) > 1:
          candidates_ = all_candidates[id(p_old)] # Unless our program has multiple predictors, in which case we need to reevaluate all prompts with the new prompt(s) for the other predictor(s)   

      # For each candidate
      for c_i, c in enumerate(candidates_):                    
          # Get the candidate instruction and prefix 
          instruction, prefix = c.proposed_instruction.strip('"').strip(), c.proposed_prefix_for_output_field.strip('"').strip()

          # Set this new module with our instruction / prefix 
          if (hasattr(p_new, 'extended_signature')):
              *_, last_key = p_new.extended_signature.fields.keys()
              p_new.extended_signature = p_new.extended_signature \
                  .with_instructions(instruction) \
                  .with_updated_fields(last_key, prefix=prefix)
          else:
              *_, last_key = p_new.extended_signature1.fields.keys()
              p_new.extended_signature1 = p_new.extended_signature1 \
                  .with_instructions(instruction) \
                  .with_updated_fields(last_key, prefix=prefix)
              *_, last_key = p_new.extended_signature2.fields.keys()
              p_new.extended_signature2 = p_new.extended_signature2 \
                  .with_instructions(instruction) \
                  .with_updated_fields(last_key, prefix=prefix)

If we refactor this, we should be sure to find a way to avoid the two cases of extended_signature vs extended_signature1 and extended_signature2.

okhat commented 5 months ago

@thomasahle I think the CorePredict will have self.instructions and self.demos, instead of any kind of changes to self.signature. Once a module is created (including CorePredict) the signature will never be changed --- that's my current thinking at least, I hope it's possible to realize in practice.

thomasahle commented 4 months ago

Doesn't signature-optimizer also change the field descriptions and prefixes though?