huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.21k stars 2.68k forks source link

Use `torch.nested_tensor` for arrays of varying length in torch formatter #4490

Open mariosasko opened 2 years ago

mariosasko commented 2 years ago

Use torch.nested_tensor for arrays of varying length in TorchFormatter.

The PyTorch API of nested tensors is in the prototype stage, so wait for it to become more mature.

NightMachinery commented 1 year ago

What's the current behavior?

mariosasko commented 1 year ago

Currently, we return a list of Torch tensors if their shapes don't match. If they do, we consolidate them into a single Torch tensor.