poets-ai / elegy

A High Level API for Deep Learning in JAX
https://poets-ai.github.io/elegy/
MIT License
469 stars 32 forks source link

Support creating Dataloader directly from HuggingFace dataset #221

Closed lkhphuc closed 2 years ago

lkhphuc commented 2 years ago

Currently it's not possible to create a eg.data.DataLoader from huggingface's dataset. HF's dataset check the indices values of the type int, slice, range str, Iterable.

    # Check if key is valid
    if not isinstance(key, (int, slice, range, str, Iterable)):
        _raise_bad_key_type(key)

Converting the indices from from np.int64 to python's int fix this.

cgarciae commented 2 years ago

@lkhphuc I don't know if this is the best way to resolve this issue. I think we need a proper HuggingFaceDatasetAdapter similar to the tensorflow adapter, that can handle a general setting. Specifically, huggingface datasets behave different when they are on streaming mode or not.

lkhphuc commented 2 years ago

That would indeed be better.