stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
18.55k stars 1.43k forks source link

[error ] Error for example in dev set: [dspy.evaluate.evaluate] filename=evaluate.py lineno=180 #1243

Closed HolographicX closed 1 month ago

HolographicX commented 4 months ago

I'm trying to compile a zeroshot chain using BootstrapFewShot and langchain. I've followed this and this, with the only difference being that I'm using my own retriever.

optimizer = BootstrapFewShotWithRandomSearch(
    max_bootstrapped_demos=2,
    max_labeled_demos=2,
    num_candidate_programs=2,
    num_threads=8,
    metric=metric,
)

optimized_chain = optimizer.compile(zeroshot_chain, trainset=trainset, valset=valset)

My retriever is defined like so:

from langchain_chroma import Chroma

vectordb = Chroma(collection_name, embeddings, persist_directory= p_dir)
retriever = vectordb.as_retriever()

Where I'm creating the collection from the hotpotqa dataset, which I'm using for my metric.

Anyway, when running optimizer.compile() I'm able to train the first sample successfully:

Average Metric: 22.333333333333332 / 50  (44.7): 100%|██████████████████████████████████████████████████████████████████████| 50/50 [00:25<00:00,  1.95it/s]
INFO:dspy.evaluate.evaluate:2024-07-04T05:16:35.308465Z [info     ] Average Metric: 22.333333333333332 / 50 (44.7%) [dspy.evaluate.evaluate] filename=evaluate.py lineno=200
...

However, during the 2nd seed, it fails every time. Here are the full logs of the failure:

Average Metric: 0.0 / 1  (0.0):   2%|█▊                                                                                      | 1/50 [00:00<00:30,  1.59it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
ERROR:dspy.evaluate.evaluate:2024-07-04T05:16:36.509832Z [error    ] Error for example in dev set:                [dspy.evaluate.evaluate] filename=evaluate.py lineno=180
Average Metric: 0.0 / 2  (0.0):   4%|███▌                                                                                    | 2/50 [00:01<00:25,  1.85it/s]ERROR:dspy.evaluate.evaluate:2024-07-04T05:16:36.529766Z [error    ] Error for example in dev set:                [dspy.evaluate.evaluate] filename=evaluate.py lineno=180
Average Metric: 0.0 / 3  (0.0):   4%|███▌                                                                                    | 2/50 [00:01<00:25,  1.85it/s]ERROR:dspy.evaluate.evaluate:2024-07-04T05:16:36.534032Z [error    ] Error for example in dev set:                [dspy.evaluate.evaluate] filename=evaluate.py lineno=180
Average Metric: 0.0 / 4  (0.0):   6%|█████▎                                                                                  | 3/50 [00:01<00:25,  1.85it/s]ERROR:dspy.evaluate.evaluate:2024-07-04T05:16:36.583842Z [error    ] Error for example in dev set:                [dspy.evaluate.evaluate] filename=evaluate.py lineno=180
Average Metric: 0.0 / 5  (0.0):   8%|███████                                                                                 | 4/50 [00:01<00:24,  1.85it/s]ERROR:dspy.evaluate.evaluate:2024-07-04T05:16:36.604728Z [error    ] Error for example in dev set:                [dspy.evaluate.evaluate] filename=evaluate.py lineno=180
Average Metric: 0.0 / 6  (0.0):  12%|██████████▌                                                                             | 6/50 [00:01<00:06,  6.98it/s]ERROR:dspy.evaluate.evaluate:2024-07-04T05:16:36.633670Z [error    ] Error for example in dev set:                [dspy.evaluate.evaluate] filename=evaluate.py lineno=180
Average Metric: 0.0 / 7  (0.0):  12%|██████████▌                                                                             | 6/50 [00:01<00:06,  6.98it/s]ERROR:dspy.evaluate.evaluate:2024-07-04T05:16:36.634493Z [error    ] Error for example in dev set:                [dspy.evaluate.evaluate] filename=evaluate.py lineno=180
Average Metric: 0.0 / 8  (0.0):  14%|████████████▎                                                                           | 7/50 [00:01<00:06,  6.98it/s]ERROR:dspy.evaluate.evaluate:2024-07-04T05:16:36.728874Z [error    ] Error for example in dev set:                [dspy.evaluate.evaluate] filename=evaluate.py lineno=180
Average Metric: 0.0 / 9  (0.0):  18%|███████████████▊                                                                        | 9/50
Traceback (most recent call last):
  File "/home/soham/development/pdf/multi-vector-db/dspy_evaluatev2.py", line 161, in <module>
    optimized_chain = optimizer.compile(zeroshot_chain, trainset=trainset, valset=valset)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/soham/development/pdf/multi-vector-db/venv/lib/python3.12/site-packages/dspy/teleprompt/random_search.py", line 124, in compile
    score, subscores = evaluate(program2, return_all_scores=True)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/soham/development/pdf/multi-vector-db/venv/lib/python3.12/site-packages/dspy/evaluate/evaluate.py", line 193, in __call__
    reordered_devset, ncorrect, ntotal = self._execute_multi_thread(
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/soham/development/pdf/multi-vector-db/venv/lib/python3.12/site-packages/dspy/evaluate/evaluate.py", line 110, in _execute_multi_thread
    example_idx, example, prediction, score = future.result()
                                              ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/soham/development/pdf/multi-vector-db/venv/lib/python3.12/site-packages/dspy/evaluate/evaluate.py", line 103, in cancellable_wrapped_program
    return wrapped_program(idx, arg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/soham/development/pdf/multi-vector-db/venv/lib/python3.12/site-packages/dspy/evaluate/evaluate.py", line 178, in wrapped_program
    raise e
  File "/home/soham/development/pdf/multi-vector-db/venv/lib/python3.12/site-packages/dspy/evaluate/evaluate.py", line 160, in wrapped_program
    prediction = program(**example.inputs())
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/soham/development/pdf/multi-vector-db/venv/lib/python3.12/site-packages/dspy/primitives/program.py", line 26, in __call__
    return self.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/soham/development/pdf/multi-vector-db/venv/lib/python3.12/site-packages/dspy/predict/langchain.py", line 152, in forward
    output = self.chain.invoke(dict(**kwargs))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/soham/development/pdf/multi-vector-db/venv/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 2393, in invoke
    input = step.invoke(
            ^^^^^^^^^^^^
  File "/home/soham/development/pdf/multi-vector-db/venv/lib/python3.12/site-packages/dspy/predict/langchain.py", line 115, in invoke
    return self.forward(**d)
           ^^^^^^^^^^^^^^^^^
  File "/home/soham/development/pdf/multi-vector-db/venv/lib/python3.12/site-packages/dspy/predict/langchain.py", line 94, in forward
    prompt = signature(dsp.Example(demos=demos, **kwargs))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/soham/development/pdf/multi-vector-db/venv/lib/python3.12/site-packages/dsp/templates/template_v2.py", line 210, in __call__
    self.query(demo, is_demo=True)
  File "/home/soham/development/pdf/multi-vector-db/venv/lib/python3.12/site-packages/dsp/templates/template_v2.py", line 105, in query
    formatted_value = format_handler(example[field.input_variable])
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/soham/development/pdf/multi-vector-db/venv/lib/python3.12/site-packages/dsp/templates/utils.py", line 9, in passages2text
    assert type(passages) in [list, tuple]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
HolographicX commented 4 months ago

My soft guess was that it may be related to using langchain_chroma, so I switched to ChromadbRM from dspy, but I get the exact same error on seed -2.

particularly:

.../venv/lib/python3.12/site-packages/dspy/teleprompt/random_search.py
elif seed == -2:
    # labels only
    teleprompter = LabeledFewShot(k=self.max_labeled_demos)
    program2 = teleprompter.compile(student, trainset=trainset2, sample=labeled_sample)
HolographicX commented 4 months ago
assert type(passages) in [list, tuple]
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

type(passages) is dict in my case; passages2text() can parse dictionaries.

kdb-c commented 3 months ago

hello may I ask you whether or how you solved the problem "[error ]Error for example in dev set"?

HolographicX commented 3 months ago

Whoops, I actually did not solve this. I closed it because I thought I did (with my PR), but obviously I was missing something as the PR added unintended functionality. I would love to receive help on this.

nr-bailey commented 2 months ago

Hi all, I'm seeing this problem too. Any suggestions on potential fixes?

sarora-roivant commented 2 months ago

Seeing the same issue every time I use evaluate. Any suggestions on fixes?

okhat commented 1 month ago

Fixed I think, if you migrate to DSPy 2.5 https://github.com/stanfordnlp/dspy/blob/main/examples/migration.ipynb