michaelfeil / infinity

Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
https://michaelfeil.github.io/infinity/
MIT License
1.49k stars 115 forks source link

Support for custom SentenceTransformer models #474

Open wwymak opened 3 days ago

wwymak commented 3 days ago

Model description

I have a custom SentenceTransformer model that is a custom class (And also quite nested), so on the top level the modules.json file look like

[
  {
    "idx": 0,
    "name": "0",
    "path": "0_MyModelClass",
    "type": "custom_models.MyModelClass"
  }
]

This loads correctly if I use SentenceTransformers directly, but when loading in infinity it complains that the .auto_model attribute is missing (thrown by these lines https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/infinity_emb/transformer/embedder/sentence_transformer.py#L81-L93 ). If this sort of custom model can be supported, or if you can give me some guidance on the correct way to save the model, that would be great.

Open source status & huggingface transformers.

michaelfeil commented 3 days ago

This is not supposed to work, name needs to be "auto_model". I would recommend exporting it to onnx (using your favorite toolchain)

slobstone commented 2 days ago

I've seen a few cases with sentence transformers where teams have used slightly bespoke structures (wrappers around a single Transformer at the heart of it all) -- this may get even more prevalent with Release 3.1.0 . Tweaking this line to "something along the lines of" fm = [x for x in self.modules() if hasattr(x, "auto_model")][0] would work in cases where there is a single Transformer. (Also, you may have to turn off the warmup (--no-model-warmup) depending on whether or not your model expects a particular form of input)

michaelfeil commented 2 days ago

Thanks. Perhjaps `hasattr(fm, "auto_model") would be helpful. be prepared that there is little / no optimizations doable for a generic CustomModel