huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.32k stars 871 forks source link

Title: Support for non-tensor based datasets in Accelerate #1878

Open hzphzp opened 10 months ago

hzphzp commented 10 months ago

System Info

accelerate==0.21.0
python==3.9

Information

Tasks

Reproduction

I encountered an issue while using the Accelerate library, as it does not support non-tensor based datasets. For instance, consider the following situation:

        return {  
            "image": tensor,  
            "text_input": str  
        }  

In this case, an error occurs: Can only concatenate tensors but got <class 'str'>. The problem lies in the following code:

raise TypeError(f"Can only concatenate tensors but got {type(data[0])}")

The concat function here can only handle tensor type data, otherwise, it raises an error.

Expected behavior

However, in many practical applications, we often need to work with non-tensor datasets in IO operations. I would like to request support for non-tensor type datasets in Accelerate.

github-actions[bot] commented 9 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

hzphzp commented 9 months ago

Waiting for kindly reply ...

irowberryFS commented 5 months ago

I'm getting the same issue. I'm working with PyTorch Geometric Data objects. According to various other threads I've read, it was supported in the past.