wandb / weave

Weave is a toolkit for developing AI-powered applications, built by Weights & Biases.
https://wandb.me/weave
Apache License 2.0
674 stars 52 forks source link

feat(Improve Datasets) #1860

Open tcapelle opened 3 months ago

tcapelle commented 3 months ago

The dataset class is still very lightweight but has a lot of potential. More now that we will have feedback and ways to annotate data. Let's try to put some feature parity with hf-datasets.

len(ds) 10

For row in ds: print(row) {'id': 1, 'text': 'The quick brown fox jumps over the lazy dog.', 'length': 43, 'other_text': 'The '} {'id': 2, 'text': 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.', 'length': 56, 'other_text': 'Lore'} {'id': 3, 'text': 'To be or not to be, that is the question.', 'length': 41, 'other_text': 'To b'} {'id': 4, 'text': 'All that glitters is not gold.', 'length': 30, 'other_text': 'All '} {'id': 5, 'text': 'A journey of a thousand miles begins with a single step.', 'length': 58, 'other_text': 'A jo'}

- Adds a map method: It bakes the `asyncio.run` call inside, maybe not a good idea? 
```python
from weave import Dataset

rows = [
    {"id": 1, "text": "The quick brown fox jumps over the lazy dog.", "length": 43},
    {"id": 2, "text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", "length": 56},
    {"id": 3, "text": "To be or not to be, that is the question.", "length": 41},
    {"id": 4, "text": "All that glitters is not gold.", "length": 30},
    {"id": 5, "text": "A journey of a thousand miles begins with a single step.", "length": 58},
]

ds = Dataset(name="cape_dev", rows=rows)

def f(text: str):
    return {"other_text": text[0:4], "text_length": len(text)}

mapped_ds = ds.map(f)
print(mapped_ds)

Mapped 5 of 5 examples in 0.00 seconds
Dataset({
    name: 'cape_dev',
    features: ['id', 'text', 'text_length', 'other_text'],
    num_rows: 5
})
circle-job-mirror[bot] commented 3 months ago

Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=60a2058b487a874918784a6502fca8d53ce23153