The dataset class is still very lightweight but has a lot of potential. More now that we will have feedback and ways to annotate data. Let's try to put some feature parity with hf-datasets.
For row in ds:
print(row)
{'id': 1, 'text': 'The quick brown fox jumps over the lazy dog.', 'length': 43, 'other_text': 'The '}
{'id': 2, 'text': 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.', 'length': 56, 'other_text': 'Lore'}
{'id': 3, 'text': 'To be or not to be, that is the question.', 'length': 41, 'other_text': 'To b'}
{'id': 4, 'text': 'All that glitters is not gold.', 'length': 30, 'other_text': 'All '}
{'id': 5, 'text': 'A journey of a thousand miles begins with a single step.', 'length': 58, 'other_text': 'A jo'}
- Adds a map method: It bakes the `asyncio.run` call inside, maybe not a good idea?
```python
from weave import Dataset
rows = [
{"id": 1, "text": "The quick brown fox jumps over the lazy dog.", "length": 43},
{"id": 2, "text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", "length": 56},
{"id": 3, "text": "To be or not to be, that is the question.", "length": 41},
{"id": 4, "text": "All that glitters is not gold.", "length": 30},
{"id": 5, "text": "A journey of a thousand miles begins with a single step.", "length": 58},
]
ds = Dataset(name="cape_dev", rows=rows)
def f(text: str):
return {"other_text": text[0:4], "text_length": len(text)}
mapped_ds = ds.map(f)
print(mapped_ds)
Mapped 5 of 5 examples in 0.00 seconds
Dataset({
name: 'cape_dev',
features: ['id', 'text', 'text_length', 'other_text'],
num_rows: 5
})
The dataset class is still very lightweight but has a lot of potential. More now that we will have feedback and ways to annotate data. Let's try to put some feature parity with hf-datasets.
str
,len
,iter
methodsFor row in ds: print(row) {'id': 1, 'text': 'The quick brown fox jumps over the lazy dog.', 'length': 43, 'other_text': 'The '} {'id': 2, 'text': 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.', 'length': 56, 'other_text': 'Lore'} {'id': 3, 'text': 'To be or not to be, that is the question.', 'length': 41, 'other_text': 'To b'} {'id': 4, 'text': 'All that glitters is not gold.', 'length': 30, 'other_text': 'All '} {'id': 5, 'text': 'A journey of a thousand miles begins with a single step.', 'length': 58, 'other_text': 'A jo'}