Compatibility with HF Inference Endpoints?

michaelfeil / hf-hub-ctranslate2

Connecting Transformers on HuggingFace Hub with CTranslate2

https://michaelfeil.github.io/hf-hub-ctranslate2/

MIT License

32 stars 2 forks source link

Compatibility with HF Inference Endpoints? #10

Closed anttttti closed 1 year ago

anttttti commented 1 year ago

The library doesn't seem to have easy compatibility with HFIE. Am I missing a shortcut, or could this be fixed in some way?

By default GeneratorCT2fromHfHub downloads the model from HF instead of using the model in the repo, as HFIE does by default. But called from HFIE, the inference endpoint doesn't have access to download the model, so huggingface_hub/_snapshot_download.py throws requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/...

Modifying code to use the repo version doesn't seem to work either. self.ctranslate_class throws RuntimeError: [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - invalid literal; last read: '<' when trying to load the model in the repo. The model.bin in ~/.cache/huggingface/hub/ blobs seems to be different from the repo one.

anttttti commented 1 year ago

If anyone else is trying this, there's a workaround. This is:

creating symbolic links under "~/.cache/huggingface/hub/" in the HFIE handler.py file, so HFIE can read the HF repo model files from there
modifying a local version of the hf-hub-ctranslate2 translate.py file, which has local_files_only=True provided to the _utils._download_model() calls, so hf-hub-ctranslate2 doesn't try to download the files

Having an officially supported solution would be nice.

michaelfeil commented 1 year ago

HFIE, in general, supports only inference of models in PyTorch. This framework is completely different, it is in general not depending on PyTorch. Even if you manage to download the model, I am not sure if the API is compatible at all.

If you want to work on „official“ support, the best bet is implementing it here: https://github.com/huggingface/text-generation-inference/issues/496

anttttti commented 1 year ago

I've made an example :hugs: model repo of this available: https://huggingface.co/anttip/ct2fast-e5-small-v2-hfie. You can click "Deploy -> Inference Endpoints" and select GPU [small] as the replica type.

I replicated your repo https://huggingface.co/michaelfeil/ct2fast-e5-small-v2, and added the requirements.txt, handler.py and translate.py files.

For other models this works the same way. You'll need to customize the handler.py and requirements based on the model and package dependencies. HFIE also the option of using a custom container type, which you can modify a lot more. The Default container just installs dependencies using requirements.txt and handles calls to the model using handler.py.

michaelfeil commented 1 year ago

Amazing work! I was not aware of that :)