Need help on the observations of dspy experiments

GaneshSKulkarni commented 3 months ago

Hi - I am working on chatbot to answer the questions from the document using RAG method. I have used DSPy framework for prompt tuning. I have done experimentation with DSPy for our use case and computed the performance using RAGAS Answer Correctness metrics. I have some observations regarding the performance of the prompts generated with DSPy.

For this experiment we are using GPT4-32K LLM. I have generated 2 prompts, one with uncompiled dspy signature and another custom prompt to be called via direct Azure OpenAI API. The same question and context (retrieved earlier) are sent along with both the prompts. When we inspect the prompts sent to the model both are same expect that one is called via the DSPy framework and the other directly via langchain APIs. The observation here is that the responses from direct API call prompt are detailed and performed better than DSPy prompt almost everytime.

Can anyone please provide some thoughts on why this might happen.

We also observed that DSPy always preprocess the input and replaces new line characters with spaces, there by combining all the chunks provided in the context. Is there a way to avoid this. (For the above experiment I replicated the behaviour for the direct API prompt by replacing new lines with spaces to compare the performance)

Thanks in advance.

arnavsinghvi11 commented 3 months ago

Hi @GaneshSKulkarni , you can use inspect_history to get some more observability on the prompts and outputs being passed in. If you are looking for more in-depth tracing, feel free to check out how to do so using Arise Phoenix in DSPy.

GaneshSKulkarni commented 2 months ago

Hi @arnavsinghvi11

Thank you for the response and suggestions. Please find points herewith.

I have used inspect history and in-depth tracing of DSPy to make the observation that I have mentioned above. I have followed the in-depth notebooks for DSPy implementation.
To reiterate on my question, can you please provide your thoughts why DSPy is under performing and not able to utilize the context information provided. To mention again, I have used same prompts (took reference prompts thought inspect history) to generate the response with custom prompt to analyze the DSPy responses.
Why DSPy removes new line (‘\n’) character during preprocessing which result financial loss of information (observed via prompts of inspect history).

I would be happy to provide any information needed, please let me know. Please help me here to address these specific concerns.

Thank you.

imflash217 commented 2 months ago

Hi @GaneshSKulkarni , you can use inspect_history to get some more observability on the prompts and outputs being passed in. If you are looking for more in-depth tracing, feel free to check out how to do so using Arise Phoenix in DSPy.

Hi @arnavsinghvi11 , Is there a way to save the trace locally in a file and load the traces in Phonenix later for inspection. Once the DSPy execution stops, the traces also go away. I would love to load the saved traces later for inspection.

Thanks

imflash217 commented 2 months ago

@GaneshSKulkarni , I have also experienced a performance degradation (almost always) with using DSPy while the direct LLM-API call via LangChainAI performs almost way better. In my case, the output given by DSPy program included the whole prompt itself, but now this issue is getting tracked by https://github.com/stanfordnlp/dspy/issues/662, but I am not sure if you are also getting this problem or something else.

If you can give some examples observations, it would be helpful to debug.

arnavsinghvi11 commented 2 months ago

Hi @GaneshSKulkarni , you can use inspect_history to get some more observability on the prompts and outputs being passed in. If you are looking for more in-depth tracing, feel free to check out how to do so using Arise Phoenix in DSPy.

Hi @arnavsinghvi11 , Is there a way to save the trace locally in a file and load the traces in Phonenix later for inspection. Once the DSPy execution stops, the traces also go away. I would love to load the saved traces later for inspection.

Thanks

Does saving/loading help with this? DSPy programs and internal traces can be saved/loaded.

ilpoli commented 1 month ago

@GaneshSKulkarni, I've also faced the issue of removing new lines (‘\n’). The issue is caused by this piece of code.

https://github.com/stanfordnlp/dspy/blob/8e01bee8e360d0509387ca2b54d296b31e8bebb6/dsp/templates/template_v2.py#L101

as a workaround you can define your fields the following way

... = dspy.OutputField(format=lambda s:s)

stanfordnlp / dspy

Need help on the observations of dspy experiments #838