Closed anttttti closed 1 year ago
If anyone else is trying this, there's a workaround. This is:
Having an officially supported solution would be nice.
HFIE, in general, supports only inference of models in PyTorch. This framework is completely different, it is in general not depending on PyTorch. Even if you manage to download the model, I am not sure if the API is compatible at all.
If you want to work on „official“ support, the best bet is implementing it here: https://github.com/huggingface/text-generation-inference/issues/496
I've made an example :hugs: model repo of this available: https://huggingface.co/anttip/ct2fast-e5-small-v2-hfie. You can click "Deploy -> Inference Endpoints" and select GPU [small] as the replica type.
I replicated your repo https://huggingface.co/michaelfeil/ct2fast-e5-small-v2, and added the requirements.txt, handler.py and translate.py files.
For other models this works the same way. You'll need to customize the handler.py and requirements based on the model and package dependencies. HFIE also the option of using a custom container type, which you can modify a lot more. The Default container just installs dependencies using requirements.txt and handles calls to the model using handler.py.
Amazing work! I was not aware of that :)
The library doesn't seem to have easy compatibility with HFIE. Am I missing a shortcut, or could this be fixed in some way?
By default GeneratorCT2fromHfHub downloads the model from HF instead of using the model in the repo, as HFIE does by default. But called from HFIE, the inference endpoint doesn't have access to download the model, so huggingface_hub/_snapshot_download.py throws
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/...
Modifying code to use the repo version doesn't seem to work either. self.ctranslate_class throws
RuntimeError: [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - invalid literal; last read: '<
' when trying to load the model in the repo. The model.bin in ~/.cache/huggingface/hub/ blobs seems to be different from the repo one.