stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
13.82k stars 1.06k forks source link

Feat(dspy): from_pandas support #1176

Closed Anindyadeep closed 4 days ago

Anindyadeep commented 1 week ago

This PR adds support for to load dspy dataset from dataframes directly. Something which I personally found useful when fetching from some sources did some cleanup and get dspy dataset without saving it as a csv.

Fixes issue: #1177

Anindyadeep commented 1 week ago

PS: Just for the ruff checks, I did additional changes on types. Let me know if I need to remove that.

Josephrp commented 1 week ago

very nice update !

krypticmouse commented 6 days ago

I see one else block removed in from_huggingface method is it a breaking change? It might be for few datasets, did you check?

Anindyadeep commented 5 days ago

I see one else block removed in from_huggingface method is it a breaking change? It might be for few datasets, did you check?

Okay, so I reverted to the implementation along with I added the from_pandas implementation. However I am not sure how to remove the lint problem, because I face these error even when I am on the container (built from the repo).

Let me know, I will be doing that in other of my PRs too.

krypticmouse commented 4 days ago

Thanks for the contribution!! Did you try running ruff check . --fix?

okhat commented 4 days ago

Let's merge this if @krypticmouse approves

Anindyadeep commented 4 days ago

Thanks for the contribution!! Did you try running ruff check . --fix?

Hey thanks for the quick pointer, although I have used this command earlier which gives me this output:

.....
testing/tasks/tweet_metric.py:66:5: ANN201 Missing return type annotation for public function `metric`
testing/tasks/tweet_metric.py:66:24: ARG001 Unused function argument: `trace`
testing/tasks/tweet_metric.py:67:5: N806 Variable `gpt3T` in function should be lowercase
testing/tasks/tweet_metric.py:67:12: N806 Variable `gpt4T` in function should be lowercase
testing/tasks/tweet_metric.py:73:121: E501 Line too long (122 > 120)
testing/tasks/tweet_metric.py:82:121: E501 Line too long (122 > 120)
testing/tasks/tweet_metric.py:94:9: ANN201 Missing return type annotation for public function `forward`
testing/tasks/tweet_metric.py:96:121: E501 Line too long (126 > 120)
testing/tasks/tweet_metric.py:104:121: E501 Line too long (126 > 120)
testing/tasks/tweet_metric.py:121:9: N806 Variable `gpt3T` in function should be lowercase
testing/tasks/tweet_metric.py:121:16: N806 Variable `gpt4T` in function should be lowercase
testing/tasks/tweet_metric.py:131:121: E501 Line too long (159 > 120)
testing/tasks/tweet_metric.py:140:121: E501 Line too long (155 > 120)
testing/tasks/tweet_metric.py:146:9: ANN201 Missing return type annotation for public function `get_program`
testing/tasks/tweet_metric.py:149:9: ANN201 Missing return type annotation for public function `get_metric`
Found 2281 errors.
No fixes available (692 hidden fixes can be enabled with the `--unsafe-fixes` option).

To solve now, I kinda applied a small hack (reverting back to the contents in the main branch and adding the from_pandas function) However, I am guessing I might need to set the environment once again to not see logs like above when doing ruff fix?

krypticmouse commented 4 days ago

Yea try syncing the repo with the current version and lemme know if you still see the errors

krypticmouse commented 4 days ago

Thank you for the contribution. Merging it!