ml-explore / mlx-data

Efficient framework-agnostic data loading
MIT License
362 stars 40 forks source link

doc - added walkthrough for using huggingface datasets with mlx streams #67

Closed mwrites closed 3 months ago

mwrites commented 4 months ago

Hello, I added a step by step on how to use the datasets library from Hugging Face and turn that into mlx streams

Although mlx is not dependent on hf, it is such a common workflow in the industry that I thought it might be helpful to share. Not sure if this should sit in mlx-data or the main mlx repo.

I also got a notebook version here https://github.com/mwrites/apple-mlx-tutorials/blob/main/hf_datasets_mlx_streams.ipynb

image

Side Note:

I noticed that these two minimal examples

buff = dx.buffer_from_vector([{"x": i} for i in range(10)])
print(type(buff[0]['x']))

buff = buff.key_transform("x", lambda x: mx.ones(3), output_key="o")
print(type(buff[0]['x']))

always print <class 'numpy.ndarray'> hence the need to do mx.array(buff[0]['x']) again later when passing the inputs to a model.

angeloskath commented 4 months ago

That looks very cool thanks for the addition! I 'll take a close look shortly.

mwrites commented 3 months ago

@angeloskath is there anything you would like me to change? 🍪

angeloskath commented 3 months ago

Hi @mwrites . No there wasn't anything to change, sorry for the delay.