stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
16.85k stars 1.3k forks source link

Saving & reading the optimized model #617

Closed owen-deepskill closed 4 months ago

owen-deepskill commented 6 months ago

Compiling takes some time so I wouldn't want my users repeat the same process every time they run the product. Having said that, how can I save the model and reuse it next time without re-compiling?

Specifically, in this tutorial, https://drchrislevy.github.io/posts/dspy/dspy.html optimized_cot_qa = teleprompter.compile(cot_qa, trainset=trainset, valset=valset) How would I be able to save optimized_cot_qa so that I can reuse?

Thanks!

insop commented 6 months ago

You could do save, load, and use dump_state to check compiled states.

optimized_cot_qa.save("file.json")

loaded_optimized_cot_qa = CoT()
loaded_optimized_cot_qa.load("file.json")
owen-deepskill commented 6 months ago

Thanks so much @insop !

Sorry for my ignorance but am I correct in using the load function in this way?

optimized_cot_qa.save("optimized_cot_qa.json")

loaded_optimized_cot_qa = CoT()
loaded_optimized_cot_qa.load("optimized_cot_qa.json")

evaluate(loaded_optimized_cot_qa)
loaded_optimized_cot_qa(trainset[0].question)

evaluate and calling the QA directly don't seem to work, unlike the original optimized_cot_qa

insop commented 6 months ago

could you do the dump_state() for original module and loaded module and comare?

loaded_optimized_cot_qa.dump_state()
owen-deepskill commented 6 months ago

I tried but still stuck with "augmented" not in demo or not demo.augmented and then AttributeError: 'dict' object has no attribute 'augmented'. Am I missing anything?

>>> loaded_optimized_cot_qa = CoT()
>>> loaded_optimized_cot_qa.load("optimized_cot_qa.json")
>>> loaded_optimized_cot_qa.dump_state()
>>> evaluate(loaded_optimized_cot_qa)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\owenc\anaconda3\Lib\site-packages\dspy\evaluate\evaluate.py", line 128, in __call__                                                              | 0/126 [00:00<?, ?it/s] 
    reordered_devset, ncorrect, ntotal = self._execute_multi_thread(wrapped_program, devset, num_threads, display_progress)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\owenc\anaconda3\Lib\site-packages\dspy\evaluate\evaluate.py", line 63, in _execute_multi_thread
    example_idx, example, prediction, score = future.result()
                                              ^^^^^^^^^^^^^^^
  File "C:\Users\owenc\anaconda3\Lib\concurrent\futures\_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\owenc\anaconda3\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
  File "C:\Users\owenc\anaconda3\Lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\owenc\anaconda3\Lib\site-packages\dspy\evaluate\evaluate.py", line 116, in wrapped_program
    raise e
  File "C:\Users\owenc\anaconda3\Lib\site-packages\dspy\evaluate\evaluate.py", line 101, in wrapped_program
    prediction = program(**example.inputs())
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\owenc\anaconda3\Lib\site-packages\dspy\primitives\program.py", line 29, in __call__
    return self.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<stdin>", line 6, in forward
  File "C:\Users\owenc\anaconda3\Lib\site-packages\dspy\predict\predict.py", line 49, in __call__
    return self.forward(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\owenc\anaconda3\Lib\site-packages\dspy\predict\chain_of_thought.py", line 59, in forward
    return super().forward(signature=signature, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\owenc\anaconda3\Lib\site-packages\dspy\predict\predict.py", line 90, in forward
    x, C = dsp.generate(template, **config)(x, stage=self.stage)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\owenc\anaconda3\Lib\site-packages\dsp\primitives\predict.py", line 77, in do_generate
    prompt = template(example)
             ^^^^^^^^^^^^^^^^^
  File "C:\Users\owenc\anaconda3\Lib\site-packages\dsp\templates\template_v2.py", line 206, in __call__
    rdemos = [
             ^
  File "C:\Users\owenc\anaconda3\Lib\site-packages\dsp\templates\template_v2.py", line 210, in <listcomp>
    ("augmented" not in demo or not demo.augmented)
                                    ^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'augmented'
arnavsinghvi11 commented 6 months ago

Hi @owen-deepskill , could you share the code for compiling optimized_cot_qa. Ideally, the saving and loading is correct as you've done it, but I believe this error would be coming from some discrepancies in the original compiled model.

Feel free to reference this example again for saving/loading compiled models.

insop commented 6 months ago

Hi @owen-deepskill

What version of DSPy are you using, if you are not using the latest, then you might want to try the latest version.

owen-deepskill commented 6 months ago

Thanks @arnavsinghvi11 and @insop . I'm using 2.3.4 which seems to be the latest. I tried the example and it was working find until I saved the model. But it didn't work when I tried loading:

cot_fewshot2 = ScoNeCoT()
cot_fewshot2.load("scone-cot_fewshot-turbo-gpt4-demos.json")
evaluator(cot_fewshot2, metric=scone_accuracy)

The result, which is the same error:

File [c:\Users\owenc\anaconda3\Lib\site-packages\dsp\templates\template_v2.py:210](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:210), in <listcomp>(.0)
    [203](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:203) if self.fields[-1].input_variable in example:
    [204](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:204)     del example[self.fields[-1].input_variable]
    [206](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:206) rdemos = [
    [207](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:207)     self.query(demo, is_demo=True)
    [208](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:208)     for demo in example.demos
    [209](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:209)     if (
--> [210](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:210)         ("augmented" not in demo or not demo.augmented)
    [211](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:211)         and (  # validate that the training example has the same primitive input var as the template
    [212](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:212)             self.fields[-1].input_variable in demo
    [213](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:213)             and demo[self.fields[-1].input_variable] is not None
    [214](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:214)         )
    [215](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:215)     )
    [216](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:216) ]
    [218](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:218) ademos = [
    [219](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:219)     self.query(demo, is_demo=True)
    [220](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:220)     for demo in example.demos
    [221](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:221)     if "augmented" in demo and demo.augmented
    [222](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:222) ]
    [224](file:///C:/Users/owenc/anaconda3/Lib/site-packages/dsp/templates/template_v2.py:224) # Move the rdemos to ademos if rdemo has all the fields filled in

AttributeError: 'dict' object has no attribute 'augmented'
arnavsinghvi11 commented 4 months ago

@owen-deepskill Could you try using dspy from source instead of pypi? Seems like past errors related to this are fixed through that but if not, we can take a closer look!

owen-deepskill commented 4 months ago

@arnavsinghvi11 Apologies for the delayed response. Thanks for the help. Since I encountered this problem, I changed my IDE and DSPy, etc. And for some reason I can see it is working now. Not sure what was wrong in the previous configuration. I will report if I see this error again. Many thanks!

sartyagi91 commented 3 months ago

Hey I encountered the same error when I was compiling one of the examples from the dspy library, I get an error like this "TypeError: {'Talk About a Stranger', 'Nancy Reagan'} is not JSON serializable"

ajinkyaathlye commented 2 months ago

Hey, the load function does not seem to be working. I have compiled a program using a couple of optimizers (BootstrapFewShot, BootstrapFewShotWithOptuna) and then saved it using the save method. It seems to have saved it fine (although it holds a frustratingly small amount of information after almost 15 hours of training), but when I load it all I get is a None value.

Here is how I have used it:

class MultiHopRAG(dspy.Module):
    # Module logic
    return dspy.Prediction()

teleprompter = BootstrapFewShotWithOptuna()
compiled_program = teleprompter.compile(MultiHopRAG)
compiled_program.save('./multi_hop_rag')
loaded_program = MultiHopRAG().load('./multi_hop_rag')

The loaded_program is simply a None value.

wugxxx commented 1 month ago

Hey, the load function does not seem to be working. I have compiled a program using a couple of optimizers (BootstrapFewShot, BootstrapFewShotWithOptuna) and then saved it using the save method. It seems to have saved it fine (although it holds a frustratingly small amount of information after almost 15 hours of training), but when I load it all I get is a None value.

Here is how I have used it:

class MultiHopRAG(dspy.Module):
    # Module logic
    return dspy.Prediction()

teleprompter = BootstrapFewShotWithOptuna()
compiled_program = teleprompter.compile(MultiHopRAG)
compiled_program.save('./multi_hop_rag')
loaded_program = MultiHopRAG().load('./multi_hop_rag')

The loaded_program is simply a None value.

The load() function has no return value, Maybe you should split the last step like this:

loaded_program = MultiHopRAG()
lloaded_program.load('./multi_hop_rag')
# continue your logic
imflash217 commented 1 month ago

Hi @owen-deepskill , could you share the code for compiling optimized_cot_qa. Ideally, the saving and loading is correct as you've done it, but I believe this error would be coming from some discrepancies in the original compiled model.

Feel free to reference this example again for saving/loading compiled models.

@arnavsinghvi11

  1. Why does the saved .json file has empty traces & train keys??
  2. Is there a way to modify the .save() method store more information for exact reloading in a different runtime?

https://github.com/stanfordnlp/dspy/blob/51d2d4f9bd5f94ca624a5f0aad18a09d707a91d5/examples/nli/scone/scone-cot_fewshot-turbo-gpt4-demos.json#L1-L8