stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
17.36k stars 1.33k forks source link

MIPROv2: Error getting source code: unhashable type: 'list'. #1503

Closed NumberChiffre closed 1 week ago

NumberChiffre commented 1 week ago

I'm running MIPROv2 using v2.4.14 and v2.4.16 showing the same thing when running (no problem getting stuck), I'm wondering if this is something to watch for as I'm not sure if it is okay to ignore it:

Sample Code

lm = dspy.OpenAI(
    model="deepseek-chat", 
    model_type="chat", 
    api_key=os.environ["DEEPSEEK_API_KEY"], 
    base_url=os.environ["DEEPSEEK_BASE_URL"], 
    system_prompt=system_prompt, 
    temperature=1.0,
    top_p=1.0,
    max_tokens=4096
)
dspy.settings.configure(lm=lm, experimental=True)

model = EntityRelationshipExtractor()
optimizer = MIPROv2(
    prompt_model=lm,
    task_model=lm,
    metric=entity_recall_metric,
    init_temperature=1.0,
    num_candidates=4,
    verbose=True
)
kwargs = dict(num_threads=os.cpu_count(), display_progress=True, display_table=0)
miprov2_model = optimizer.compile(
    model, 
    trainset=trainset, 
    valset=valset, 
    requires_permission_to_run=False,
    num_batches=10, 
    max_labeled_demos=5, 
    max_bootstrapped_demos=3, 
    eval_kwargs=kwargs
)

Logs:

WARNING: Projected Language Model (LM) Calls

Please be advised that based on the parameters you have set, the maximum number of LM calls is projected as follows:

- Prompt Model: 10 data summarizer calls + 4 * 2 lm calls in program + (3) lm calls in program aware proposer = 21 prompt model calls
- Task Model: 25 examples in minibatch * 10 batches + 20 examples in train set * 1 full evals = 270 task model calls

Estimated Cost Calculation:

Total Cost = (Number of calls to task model * (Avg Input Token Length per Call * Task Model Price per Input Token + Avg Output Token Length per Call * Task Model Price per Output Token) 
            + (Number of calls to prompt model * (Avg Input Token Length per Call * Task Prompt Price per Input Token + Avg Output Token Length per Call * Prompt Model Price per Output Token).

For a preliminary estimate of potential costs, we recommend you perform your own calculations based on the task
and prompt models you intend to use. If the projected costs exceed your budget or expectations, you may consider:

- Reducing the number of trials (`num_batches`), the size of the trainset, or the number of LM calls in your program.
- Using a cheaper task model to optimize the prompt.
Error getting source code: unhashable type: 'list'.

Running without program aware proposer.
b: 10
summary: Prediction(
    summary="The dataset consists primarily of news articles from CNN, covering diverse topics such as sports, politics, health, and social issues, with a strong emphasis on entity-rich content and complex relationships. Temporal and geographical contexts are frequently included, and the articles are concise and informative, reflecting CNN's journalistic style."
)
DATA SUMMARY: The dataset consists primarily of news articles from CNN, covering diverse topics such as sports, politics, health, and social issues, with a strong emphasis on entity-rich content and complex relationships. Temporal and geographical contexts are frequently included, and the articles are concise and informative, reflecting CNN's journalistic style.
  0%|                                                                                                                                               | 0/20 [00:00<?, ?it/s]
NumberChiffre commented 1 week ago

Seems to be coming from the dspy.Module, evaluation doesn't raise any issue but will fail with MIPROv2 once examples have been augmented. Closing this issue.