huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.98k stars 975 forks source link

Distributed inference example for llava_next #3179

Open VladOS95-cyber opened 1 month ago

VladOS95-cyber commented 1 month ago

Add distributed inference example for LLaVA-NeXT-Video-7B-hf

Before submitting

Who can review?

@sayakpaul @a-r-r-o-w @muellerzr

VladOS95-cyber commented 1 month ago

Hi @sayakpaul @a-r-r-o-w! This PR is ready for review, please, take a look.

HuggingFaceDocBuilderDev commented 1 month ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

VladOS95-cyber commented 2 weeks ago

Hey @sayakpaul! Please, take a look on recent changes. Does it look ok to you?

sayakpaul commented 1 day ago

@muellerzr I think this is now ready for your review.

VladOS95-cyber commented 1 day ago

Thanks for the updates, I left some minor comments. I think this is close to merge.

Could you additionally show some sample outputs (i.e., video - generated caption pairs)?

@sayakpaul Thank you for your review! Do you mean by providing some examples in description to the script? Just in case, as example: { "prompt": "USER: <video>\nGenerate caption ASSISTANT:", "video": "/Users/{user_name}/.cache/huggingface/hub/datasets--malterei--LLaVA-Video-small-swift/snapshots/7d712657d4907c4f29c2c6fa17afaa7289a362a7/videos/4255049031.mp4", "generated_text": [ "USER: \nGenerate caption ASSISTANT: \"Practicing his moves: A man in a white shirt and jeans demonstrates his agility and balance on a basketball court, with a water bottle nearby, ready for hydration.\"" ] }