Adding an Extension to our OnnxClip class that automatically handles all necessary preprocessing when switching from CLIP to Siglip - currently 328 FP16 is supported but quarantized with multiple resolution is on it's way
pooled outputs are used with each text and image onnx model but type='siglip adds the hidden output to self.hidden_image and self.hidden_text for further self-attention analysis and for SAM point preparation in the future
Environment
Ubuntu 22.04 - RTX 3080, 8-core
Incoming Changes :
Example notebook
Benchamarks
SAM guidance with automatic multiple point inference
OnnxClip
class that automatically handles all necessary preprocessing when switching from CLIP to Siglip - currently 328 FP16 is supported but quarantized with multiple resolution is on it's waytype='siglip
adds the hidden output toself.hidden_image
andself.hidden_text
for further self-attention analysis and for SAM point preparation in the futureEnvironment
Incoming Changes :