staghado / vit.cpp

Inference Vision Transformer (ViT) in plain C/C++ with ggml
MIT License
225 stars 18 forks source link

Implementation of vision models #6

Closed goutamyg closed 10 months ago

goutamyg commented 10 months ago

Hi @staghado thank you for publishing this amazing implementation of ViT using ggml.

I was thinking of doing a similar implementation for another transformer-based model by following your codebase. However, I could not find good documentation of ggml to know about existing functionalities and how to use them. Also, certain concepts (e.g., https://github.com/staghado/vit.cpp/blob/main/main.cpp#L82-L91) are not seen in a Python-based inference script.

Can you please share your approach when you implemented this ggml-based vit code? What were the resources that helped you to build this project? I appreciate any help you can provide.

staghado commented 10 months ago

Hi @goutamyg, thank you for your interest in the project.

As you mentioned ggml is still in active development and is changing continuously so there is no documentation for now, because you can't document something that might change completely the next day or so.

I think a good starting point is the examples section in ggml, llama.cpp and whisper.cpp. In these examples you will find the general structure of a ggml based inference model.

goutamyg commented 10 months ago

Thank you for replying! I will follow your suggestion.