mosaicml / streaming

A Data Streaming Library for Efficient Neural Network Training
https://streaming.docs.mosaicml.com
Apache License 2.0
1.08k stars 136 forks source link

Using mosaicml streaming with accelerate ? #722

Open benihime91 opened 2 months ago

benihime91 commented 2 months ago

Hi , how can i integrate mosaicml streaming with huggingface-accelerate.

Normally with a stypical dataset and dataloader you would need to do

data_loader = accelerator.prepare(data_loader)

and internally i think accelerate is wrapping the loader under a DistributedSampler of sorts. Is this required when using mosaicml streaming dataset ? Or i can skip this step following this comment: https://github.com/mosaicml/streaming/issues/225#issuecomment-1510478052

My use case is for multi-gpu multi-node training.

snarayan21 commented 2 days ago

Hey @benihime91, what was the solution here? We've had some folks ask about using hf accelerate so would be good to know so we can add to docs.

cc @XiaohanZhangCMU