huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.01k stars 27.01k forks source link

How to use pre-trained BERT or GPT transformers for CNN based video captioning #21279

Closed adeljalalyousif closed 1 year ago

adeljalalyousif commented 1 year ago

Hello, How to use pre-trained BERT or GPT transformers for video captioning task using CNN features not vision transformer

NielsRogge commented 1 year ago

Hi,

I'd recommend checking out the GIT model which was just added to the library, as it's the first one in this library that can be used for video captioning. Check out the demo notebook here.

The model is a GPT-like model conditioned on both images and text to predict the next text tokens.

NielsRogge commented 1 year ago

Closing this as it seems resolved.

adeljalalyousif commented 1 year ago

Thank you so much