Feat - Siglip onnx, clip surgery onnx and multiple context

Adding an Extension to our OnnxClip class that automatically handles all necessary preprocessing when switching from CLIP to Siglip - currently 328 FP16 is supported but quarantized with multiple resolution is on it's way
pooled outputs are used with each text and image onnx model but type='siglip adds the hidden output to self.hidden_image and self.hidden_text for further self-attention analysis and for SAM point preparation in the future

Environment

Incoming Changes :

rhysdg / vision-at-a-clip