mosaicml / streaming

A Data Streaming Library for Efficient Neural Network Training
https://streaming.docs.mosaicml.com
Apache License 2.0
1.01k stars 125 forks source link

Use IndexError instead of ValueError in __getitem__ #674

Closed keaganlong closed 2 weeks ago

keaganlong commented 1 month ago

https://github.com/mosaicml/streaming/blob/0b055ffcbce130ea7c5a99cd35fe2ec7702af4ac/streaming/base/spanner.py#L52

Hello, curious if it might be more customary to use python's IndexError instead of your custom ValueError when an index is out of bounds in __getitem__. One consequence of using the current setup is that you will get a ValueError exception when you use Spanner/LocalDataset/etc as an iterator. Python allows objects with a __getitem__ to be used as an iterator but it expects to catch and stop on IndexError.

snarayan21 commented 1 month ago

I see, that makes sense. It wouldn't impact current workflows anyways since an error is thrown regardless, but having IndexError would be more pythonic. If you're up for it, could you submit a small PR? I'd be happy to review, so feel free to tag me.