Bug/feat - SigLIP ful model handling - Githubissues

rhysdg / vision-at-a-clip

Low-latency ONNX and TensorRT based zero-shot classification and detection with contrastive language-image pre-training based prompts

16 stars 1 forks source link

Bug/feat - SigLIP ful model handling #4

Closed rhysdg closed 2 months ago

rhysdg commented 2 months ago

Reworking usage to handle models that use cosine similarity with softmax, and a sigmoid loss scenario automatically
exposing text and image encoders for all models for manual usage
adding an .inference method allowing for automatic logits and probability handling per model

Environment

Ubuntu 22.04 - RTX 3080, 8-core

Incoming Changes :

Gradio example
model warmup and benchmarks
deprecating Transformers library