Open karamavusibrahim opened 1 year ago
This is going to be hard to debug without the code that produces the error. Would it be possible for you to share it, at least a minimal version to reproduce the error?
url = #my tar file paths
preprocess_wds = #transform functions
train_dataset = wds.WebDataset(url,resampled=True).shuffle(1000)
train_dataset = (train_dataset.decode("pil",handler=wds.warn_and_continue).to_tuple("jpg;png"))
train_dataset = train_dataset.map(preprocess_wds)
train_dataset = train_dataset.with_epoch(10000)
train_dataloader = torch.utils.data.DataLoader(train_dataset,num_workers=12, batch_size=args.train_batch_size,shuffle=False,persistent_workers=True,collate_fn=collate_fn) #it works
train_dataloader = wds.WebLoader(train_dataset,batch_size=args.train_batch_size,collate_fn=collate_fn, num_workers=12,shuffle=False,persistent_workers=True)# if i use instead of above dataloader, i am getting error
So can you use PyTorch's DataLoader
or do you need to use WebLoader
? I tried to check quickly, WebLoader
doesn't seem to be an instance of DataLoader
, which could be problematic.
According to my tests, using webdataset+webloader is faster than using webdataset+dataloader, so I try to use the webdataset+webloader config.
If possible, could you please test something. Depending on the result, it might give us an idea on how to fix the issue for good. The test would be to define a custom data loader and use it instead:
class MyLoader(wds.WebLoader, torch.utils.data.DataLoader):
pass
train_dataloader = MyLoader(train_dataset, ...)
If you could report back if this works and if it attains the same speed, that would be awesome.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
CUDA 11.8 My environment is: - accelerate==0.24.0 - braceexpand==0.1.7 - certifi==2022.12.7 - charset-normalizer==2.1.1 - diffusers==0.21.4 - filelock==3.9.0 - fsspec==2023.4.0 - huggingface-hub==0.17.3 - idna==3.4 - imageio==2.31.6 - importlib-metadata==6.8.0 - jinja2==3.1.2 - lazy-loader==0.3 - markupsafe==2.1.2 - mpmath==1.3.0 - natsort==8.4.0 - networkx==3.0 - numpy==1.24.1 - opencv-python==4.8.1.78 - packaging==23.2 - pandas==2.1.1 - pillow==9.3.0 - psutil==5.9.6 - python-dateutil==2.8.2 - pytz==2023.3.post1 - pyyaml==6.0.1 - regex==2023.10.3 - requests==2.28.1 - safetensors==0.4.0 - scikit-image==0.22.0 - scipy==1.11.3 - six==1.16.0 - sympy==1.12 - tifffile==2023.9.26 - tokenizers==0.14.1 - torch==2.1.0+cu118 - torchaudio==2.1.0+cu118 - torchvision==0.16.0+cu118 - tqdm==4.66.1 - transformers==4.34.1 - triton==2.1.0 - typing-extensions==4.4.0 - tzdata==2023.3 - urllib3==1.26.13 - webdataset==0.2.62 - zipp==3.17.0
Information
- [ ] The official example scripts
- [x] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)- [x] My own task or dataset (give details below)
Reproduction
I am trying to use webdataset and webloader in my training code.I can run my code without any error if i am using webdataset and dataloader of torch,however if i want to use webdataset and webloader,i am getting this error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:1! (when checking argument for argument weight in method wrapper_CUDA___slow_conv2d_forward)
Expected behavior
Works without any error as webdataset+dataloader configuration
HI, I wanna know how to use webdataset with torch.Dataloader. I also want to use accelerate with them. Thanks
@karamavusibrahim any progress here ? We are facing similar issues while combining Webdataloader with Accelerate, ending up using Webdataset + torch dataloader. @muellerzr Any more insights here ?
@karamavusibrahim any progress here ? We are facing similar issues while combining Webdataloader with Accelerate, ending up using Webdataset + torch dataloader. @muellerzr Any more insights here ?
Could you please give the detailed code about how to use Webdataset and torch dataloader?
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
I am trying to use webdataset and webloader in my training code.I can run my code without any error if i am using webdataset and dataloader of torch,however if i want to use webdataset and webloader,i am getting this error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:1! (when checking argument for argument weight in method wrapper_CUDA___slow_conv2d_forward)
Expected behavior
Works without any error as webdataset+dataloader configuration