mosaicml / streaming

A Data Streaming Library for Efficient Neural Network Training
https://streaming.docs.mosaicml.com
Apache License 2.0
1.02k stars 125 forks source link

option to host files via https/remote being a https url #511

Open felix-red-panda opened 7 months ago

felix-red-panda commented 7 months ago

it'd be great if the files could be hosted on https e.g. for experimentation that the dataset could be hosted on localhost or a local network with a command line http server like caddy file-server

karan6181 commented 7 months ago

Hi @felix-red-panda, how does https handle a directory of dataset files? Streaming dataset takes a remote directory which can be a cloud URL or a local directory. Additionally, the streaming dataset also supports SFTP server backend if you want to try it out. However, the SFTP backend in the streaming dataset is kind of stale at this point so you might find some bugs. Please let us know how it goes. Thanks!

Skylion007 commented 6 months ago

@karan6181 I think this feature is another thing we could get for free if we enable support for fsspecs backends.

karan6181 commented 6 months ago

@Skylion007, do you mean using something like this?