Support for non-tensor bSupport for non-tensor based datasets in Accelerateased datasets in Accelerate

xiaom233 commented 1 month ago

System Info

torch==2.2.2+cu118
accelerate==0.33.0
Python==3.8.16

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[ ] My own task or dataset (give details below)

Reproduction

my dataset sample is like

{
    'img1': torch.Tensor,
    'img2': torch.Tensor,
    'txt': prompt
}

The prepared_dataloader will raise an error like the following: TypeError: Unsupported types (<class 'str'>) passed to_gpu_broadcast_one.....

I understand and respect the design. However, in my research area, sometimes it involves text prompts, which is incompatible with Accelarate.

Expected behavior

I hope to support non-tensor type datasets in Accelerate. Or at least raise a user warning instead of an error.

xiaom233 commented 1 month ago

A related quantion is the actual batch size when using a standard Pytorch Dataloader with an accelerate-wrapped model. accelerate launch train.py with defaults setup: 1 machine, 8GPUs

loader = DataLoader(
    dataset=dataset, batch_size=cfg.train.batch_size,
    num_workers=cfg.train.num_workers, shuffle=False
)
model, optimizer = accelerator.prepare(model, optimizer )

Appreciate your replay!

github-actions[bot] commented 3 days ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

xiaom233 commented 2 days ago

Appreciate a replay!

huggingface / accelerate