Closed Dsantra92 closed 1 year ago
Hi! Are the split files so large? They are just storing the split indices, no?
I was asking if it was possible/planned to use a language independent format to store the computed splits.
I see. That'd require all zipped files to be re-created. I do not think we will support this in the immediate future. You can probably consider some workaround on your side.
Makes sense!🙁
Hello devs. I am trying to develop support for OGB Datasets in MLDatasets.jl. One of the bottlenecks we are facing is loading the .pt files. This implementation here using Pickle.jl hack results in substantial memory usage compared to python. With new support for TorchArrow can you support parquet files for loading the splits?