stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
16.74k stars 1.29k forks source link

How to save the traces as a file locally? #1379

Open imflash217 opened 1 month ago

imflash217 commented 1 month ago

Hi, How can I save the traces locally as a file. I am running dspy and ArizePhoenix in a server and I don't have ability to access the hosted Phoenix App. So, I want to save the trace as a file and load it locally to do inspection.

I cannot find a way to do it in the docs.

Thanks

d3banjan commented 1 month ago
lm = dspy.OpenAI(...)
...
# run your module here
lm.inspect_history(10) # shows you the last 10 generations 
imflash217 commented 1 month ago

Thanks for the answer @d3banjan . The thing that I am not clear about from the docs is:

imflash217 commented 1 month ago
lm = dspy.OpenAI(...)
...
# run your module here
lm.inspect_history(10) # shows you the last 10 generations 

This only allows me to inspect history while the program is running. I want to inspect traces later as well offline.

arnavsinghvi11 commented 1 month ago

Hey @imflash217 , you can view DSPy program traces after loading back the saved program and printing program.prog._predict.demos and/or program.prog._predict.traces. (Assume your DSPy program is set to program, and you have configured your LM as dspy.settings.configure(lm = lm)) lmk if that helps (and we'll update the docs accordingly if so!)

imflash217 commented 1 month ago

Thanks for the response @arnavsinghvi11 . I do see the demos in the re-loaded program. But, I still cannot see the traces.

When I run the following code, I get the below response.

print("--" * 40)
try:
    pprint.pprint(optimized_cot)
finally:
    ...

# To load the module later
loaded_cot = CoT()
loaded_cot.load("optimized_cot_module.json")

demos = loaded_cot.prog._predict.demos
traces = loaded_cot.prog._predict.traces
print("=="*40)
pprint.pprint(demos)
print("_._"*30)
pprint.pprint(traces)
print("++"*40)

Output:

--------------------------------------------------------------------------------
prog = Predict(StringSignature(question -> rationale, answer
    instructions='Given the fields `question`, produce the fields `answer`.'
    question = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Question:', 'desc': '${question}'})
    rationale = Field(annotation=str required=True json_schema_extra={'prefix': "Reasoning: Let's think step by step in order to", 'desc': '${produce the answer}. We ...', '__dspy_field_type': 'output'})
    answer = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'output', 'prefix': 'Answer:', 'desc': '${answer}'})
))
================================================================================
[{'answer': '24',
  'augmented': True,
  'question': 'The result from the 40-item Statistics exam Marion and Ella '
              'took already came out. Ella got 4 incorrect answers while '
              "Marion got 6 more than half the score of Ella. What is Marion's "
              'score?',
  'rationale': 'produce the answer. We know that Marion got 6 more than half '
               "of Ella's score. Since Ella got 4 incorrect answers, her score "
               "is 40 - 4 = 36. Half of Ella's score is 36 / 2 = 18. Adding 6 "
               "to this gives us Marion's score."},
 {'answer': '600,000 feet',
  'augmented': True,
  'question': 'Stephen made 10 round trips up and down a 40,000 foot tall '
              "mountain. If he reached 3/4 of the mountain's height on each of "
              'his trips, calculate the total distance he covered.',
  'rationale': 'produce the answer. We know that Stephen made 10 round trips '
               "up and down the mountain, reaching 3/4 of the mountain's "
               'height each time. This means he covered 3/4 * 40,000 feet = '
               '30,000 feet on each trip. Since he made 10 round trips, the '
               'total distance he covered is 30,000 feet * 10 trips * 2 (round '
               'trip) = 600,000 feet.'},
 {'answer': '2',
  'augmented': True,
  'question': 'Bridget counted 14 shooting stars in the night sky.  Reginald '
              'counted two fewer shooting stars than did Bridget, but Sam '
              'counted four more shooting stars than did Reginald.  How many '
              'more shooting stars did Sam count in the night sky than was the '
              'average number of shooting stars observed for the three of '
              'them?',
  'rationale': 'produce the answer. We know that Reginald counted two fewer '
               'shooting stars than Bridget, so he counted 14 - 2 = 12 '
               'shooting stars. Sam counted four more shooting stars than '
               'Reginald, so he counted 12 + 4 = 16 shooting stars. The '
               'average number of shooting stars observed for the three of '
               'them is (14 + 12 + 16) / 3 = 14 shooting stars. Therefore, Sam '
               'counted 16 - 14 = 2 more shooting stars than the average.'},
 {'answer': '92',
  'augmented': True,
  'question': 'Sarah buys 20 pencils on Monday. Then she buys 18 more pencils '
              'on Tuesday. On Wednesday she buys triple the number of pencils '
              'she did on Tuesday. How many pencils does she have?',
  'rationale': 'produce the answer. We first add the number of pencils she '
               'bought on Monday and Tuesday, which is 20 + 18 = 38 pencils. '
               'Then on Wednesday, she buys triple the number of pencils she '
               'bought on Tuesday, which is 3 * 18 = 54 pencils. Adding this '
               'to the previous total, she has 38 + 54 = 92 pencils.'}]
_.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__._
[]
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Is there a way to load the traces as well while re-loading the compiled program; because, I always see an empty value for lm, traces, train keys in the saved json (screenshot attached). How can I fix this?

Screenshot 2024-08-16 at 00 08 01

Thank you

d3banjan commented 1 month ago

@imflash217 In dspy.settings.configure, is trace=True present?

imflash217 commented 1 month ago

@d3banjan , There is no boolean argument called trace in dspy.settings.configure.

https://github.com/stanfordnlp/dspy/blob/db4d9ca51955c4ff0fe540bac34a42a339db97ae/dsp/utils/settings.py#L36-L38

It accepts a list. So, I have set it as dspy.settings.configure(lm=lm, trace=[])

okhat commented 1 month ago

Which traces are you trying to save?

There's no other tracing I can think of. Is there something else you've been wanting to trace?

d3banjan commented 4 weeks ago

@imflash217 it seems like inspect history is the only way to see the LLM outputs. You can save the trace output to an append only log file and I hope your setup allows you to download custom files.

I looked through the codebase, especially predictor.compile method and it's base classes, to find an easy workaround to save the traces at runtime.

Sadly, I don't understand the codebase well enough yet to do it with the free time I have, but hopefully you have a better experience at doing that.

imflash217 commented 3 weeks ago

Which traces are you trying to save?

I want to save the entire execution of the program compilation during training stage. So, that I can see which prompts were sent to the LLM and what training-data were (used as examples) were picked during a particular "epoch" or trial. Currently I use Phoenix-Arize for tracking the training but I am working on GCP infra and I cannot install phoenix-arize in my GCP workbench due to some package dependency issues. So, I want to save this myself and see it without depending on Phoenix-Arize.

  • All prompts executed? Use the history or inspect_history.

Thanks for this. I do use .inspect_history() method to peek into the executed prompts.

  • The few-shot examples traced or "bootstrapped"? They're save when you save the program.

Yes. I definitely wanted to save the traced examples. Are these same as demos key in the saved JSON file (screenshot above). If yes, then I always see very small number of demos in the saved JSON. How can I increase it to make the program more resilient to varying data distributions between train-dataset and val-/test-datasets?? I can't wrap my head around this from the docs.

There's no other tracing I can think of. Is there something else you've been wanting to trace?

@okhat , I want to save the traces so that I can provide the training-traces along with the saved program when I am re-loading it during inference. If I am understanding correctly, then I think the program would need the traces while reloading during inference to get an understanding of the training data. Currently, I have to train the program everytime I need to do inference, because the saved JSON when re-loaded using .load() method DOES NOT perform even close to what I observe when I use the compiled-program in RUNTIME (doing training & inference in the same runtime without saving & without reloading anything).

Also, if DSPy uses local cache to store more relevant metadata about the training/compilation stage then how I can package it so that I can deploy my solution in production (without having to train again and again everytime for every inference)?

I can't fully wrap my head around where and how is DSPy saving its learned parameters etc. during training. I don't see any relevant metadata like temperature, etc for the LLM calls used or what it should use during inference.

Thanks.