huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
128.66k stars 25.52k forks source link

clip loss #22252

Closed hljjjmssyh closed 1 year ago

hljjjmssyh commented 1 year ago

https://github.com/mlfoundations/open_clip/blob/37b729bc69068daa7e860fb7dbcf1ef1d03a4185/src/open_clip/loss.py#L49

In the implementation of open_clip, logits distributed across multiple gpus are gathered for calculating loss. However, I cannot find the code related to this feature in this repository. I think more negative samples are very important for contrastive learning. @younesbelkada @ydshieh

ydshieh commented 1 year ago

Hi @hljjjmssyh The loss computation management is in the Trainer class, see

https://github.com/huggingface/transformers/blob/da005253b82395b6097623bcee44b819bfe72b87/src/transformers/trainer.py#L2649-L2650

sgugger commented 1 year ago

That's only for models wrapped in DataParallel @ydshieh

@hljjjmssyh We don't include code requiring torch.distributed as it then fails when the script is used on one GPU. However we could use the Accelerate library to have something that works in both situation. If you want to explore this and open a PR, I'll be happy to review!

connor-henderson commented 1 year ago

I think I’m missing something, it looks like this could be done for CLIP today with accelerate’s implementation in examples/pytorch/image-classification/run_image_classification_no_trainer.py running it with the appropriate args? Or maybe accelerate would nonetheless be a welcome addition somewhere else for the above mentioned purpose?

It also looks here like the model would in fact be wrapped in DataParallel when training on multiple gpus.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.