WIP: refactoring for GGUF support

monatis / clip.cpp

CLIP inference in plain C/C++ with no extra dependencies

MIT License

439 stars 30 forks source link

WIP: refactoring for GGUF support #75

Closed monatis closed 1 year ago

monatis commented 1 year ago

closes #60, #49, #32

This is still WIP and not yet for anything yet. I hope to finish it by Sunday.

[x] Rewrite the conversion script for GGUF support.
[x] Output text-only, vision-only and two-tower models according to the CLI argument.
[x] Allow overriding mean / std values for image processing.
[x] Load GGUF file and infer with it (WIP).
[x] Re-implement the quantization function.
[x] Support f32, q5_0, q5_1 and q8_0 models in addition to f16, q4_0 and q4_1.
[x] Handle memory buffer allocation logic for new text-only and vision-only models.
[x] Test and upload different CLIP models in new GGUF format.
[x] Update Python bindings.

monatis commented 1 year ago

I set this ready for review although it will require some tests with different models and fixes in memory allocation for text-only and vision-only model variants.

I'll do tests and make some code polishing over the coming days. Any test and feedback will be much appreciated btw.

monatis commented 1 year ago

I'm preparing the PR for merging later today. Added a shell script for bulk model conversion. Another script will do quantization in bulk. Then uploading models + updating readme and merging.

Python binding will require an update in its automatic model selection as now the smallest model in a repo is a text-only model, but I'll leave it to another PR.

monatis commented 1 year ago

Also updated the Python bindings to work with the new GGUF format and find pre-converted models uploaded to HF.

I think this is good to merge now.

Green-Sky commented 1 year ago

file names are somewhat lenghty now, with "ggml-model" "ggml-text-model" and "ggml-vision-model". feels like the "ggml" here is redundant (.gguf).

monatis commented 1 year ago

yes, it might be {full|text|vision}-model-{ftype}.gguf