Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
1.06k
stars
75
forks
source link
Adding torch.compile + fp16 + bettertransformer a CLI argument #122
Closed
michaelfeil closed 4 months ago
Proposal:
Add
torch.compile: bool
,dtype: Enum
andbettertransformer: bool
to EngineArgsEnum, dtype:
fp16
auto