instructlab / sdg

Python library for Synthetic Data Generation
https://pypi.org/project/instructlab-sdg/
Apache License 2.0
21 stars 34 forks source link

Create a utility function to convert from Pandas dataframe to Hugging Face dataset #190

Closed markmc closed 3 months ago

markmc commented 3 months ago

See #142 and https://github.com/instructlab/sdg/pull/163#discussion_r1684206468 and https://github.com/instructlab/sdg/pull/182#discussion_r1688345839

Anywhere we do this conversion, we need to:

df = df.reset_index(drop=True)
ds = Dataset.from_pandas(df)

to avoid introducing a __index_level_0__ column

We're starting to build up a number of places where we need to remember to do this, so let's add a utility function

hickeyma commented 3 months ago

@markmc I can push a PR for this if thats ok?

markmc commented 3 months ago

@markmc I can push a PR for this if thats ok?

Yep! thanks