huggingface / huggingface.js

Utilities to use the Hugging Face Hub API
https://hf.co/docs/huggingface.js
MIT License
1.37k stars 213 forks source link

Import `feature-extraction` inference type from TEI #781

Closed Wauplin closed 2 months ago

Wauplin commented 3 months ago

This PR adds a script to import feature-extraction inference types from text-embeddings-inference. The jsonschema is pulled from https://huggingface.github.io/text-embeddings-inference/openapi.json and converted into the JSONSchema format from which we generate types from the JS and Python clients. This script is highly inspired on the TGI importer script.

This PR also add prompt_name input parameter that has been newly added to TEI (see https://github.com/huggingface/text-embeddings-inference/pull/312).

Decisions taken:

  1. Keep string as input. In theory TEI is capable of handling much more complex inputs (Union[List[Union[List[int], int, str]], str]) but let's keep it simple for now. Other inference tasks are also currently defined without arrays even when InferenceAPI/Endpoints is capable of it.
  2. I only take input/output types for the /embed route, which is the closest one to feature-extraction task.

Note: in a follow-up PR it would be really nice to put this in a CI workflow that could be triggered manually to open a PR when new arguments are added to TGI / TEI.

Wauplin commented 2 months ago

@tomaarsen (or @osanseviero since I know you're working on feature-extraction lately) could I get a review on this PR please? :pray: The import script is not so important to review. Better to focus on ./inference.ts, ./specs/input.json and ./specs/output.json to check feature-extraction parameters.

Wauplin commented 2 months ago

Thanks for the reviews! Most comment are about TEI (which was expected^^). I addressed/reply where I can. Is there any blockers before merging this?