Allow overriding model file name to support custom implementations of various compressions (brotli, gzip, etc)

Feature request

Currently only the prefix can be changed. Ex.

// will fetch model_br_quantized.onnx, using latest main / v3 branch await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', {model_file_name: 'model_br'})

Allow changing the file extension, or even better, the entire model file name.

Ex.

await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', {model_file_name: 'model_quantized.onnx.br'})`

When hosted locally / offline, it will allow the use of compressed models without forcing a specific compression format for all.

Motivation

Lack of this feature hampers offline, custom deployments of Transformer.JS that do not rely on Hugging Face hosting. This feature will help with democratization of the usage and deployment of Transformer.JS in professional environments that do not support or allow external hosting of AI models due to security or policy reasons.

It will also allow bigger models to be deployed, possibly LLMs, with acceptable user experience as download wait times are an important factor for seamless web experiences.

Workarounds are possible but are not simple to implement without more in-depth knowledge. This feature will allow plug & play implementation with existing server configurations that already serve compressed files.

Your contribution

Discussion: https://github.com/xenova/transformers.js/issues/776

xenova / transformers.js