vikhyat / moondream

tiny vision language model
https://moondream.ai
Apache License 2.0
4.88k stars 433 forks source link

Better support for GPU and Flash Attention during inference #15

Open vikhyat opened 7 months ago

vikhyat commented 7 months ago

The inference code provided in this repository forces moondream to run on CPU. We should allow the user to leverage GPUs and Flash Attention for faster inference if they want to.

spartanhaden commented 7 months ago

Added CUDA support in #22