Support Huggingface Inference API

sjdthree commented 12 months ago

In addition to openai, i would like to add the ability to call a model via huggingface inference API

This would allow the deployer to select from all the models on hf, including the new well-performing open source version of Llama, Llama2

It needs the huggingface api key, similar to openai.

Here is sample (untested) code using axios to fetch results from a "gpt2" model via huggingface API:

import { useEffect, useState } from 'react';
import axios from 'axios';

export default function Home() {
  const [output, setOutput] = useState(null);

  useEffect(() => {
    const callHuggingFaceAPI = async () => {
      try {
        const response = await axios.post('https://api-inference.huggingface.co/models/gpt2', {
          inputs: 'Hello, world!'
        }, {
          headers: {
            'Authorization': 'Bearer YOUR_HUGGINGFACE_API_TOKEN',
            'Content-Type': 'application/json'
          }
        });

        setOutput(response.data[0].generated_text);
      } catch (error) {
        console.error('Failed to call Hugging Face API:', error);
      }
    };

    callHuggingFaceAPI();
  }, []);

  return (
    <div>
      <h1>Hugging Face API Output:</h1>
      <p>{output}</p>
    </div>
  );
}

sjdthree commented 12 months ago

the Huggingface_hub integration via langchain might provide an alternate route. https://python.langchain.com/docs/modules/model_io/models/llms/integrations/huggingface_hub

sjdthree commented 12 months ago

Here is some typescript / nextjs code snippets:

Create a new file in the pages/api directory. huggingface.ts:

import { NextApiRequest, NextApiResponse } from 'next';
import axios from 'axios';

export default async function handler(req: NextApiRequest, res: NextApiResponse) {
    if (req.method === 'POST') {
        try {
            const response = await axios.post(
                "https://api-inference.huggingface.co/models/gpt2",
                req.body,
                {
                    headers: { Authorization: `Bearer ${YOUR_HUGGINGFACE_API_TOKEN}` },
                }
            );
            res.status(200).json(response.data);
        } catch (error) {
            res.status(500).json({ error: 'Error calling Hugging Face API' });
        }
    } else {
        res.status(405).json({ error: 'Only POST requests are accepted' });
    }
}

Call this API route from your Next.js pages or components like this:

import axios from 'axios';

async function query(data: string) {
    const response = await axios.post('/api/huggingface', data);
    return response.data;
}

query("Can you please let us know more details about your ").then((response) => {
    console.log(JSON.stringify(response));
});

miurla commented 11 months ago

@sjdthree I researched Huggingface and was able to get it up and running easily. However, to run Llama2 as an API on Huggingface, we need to host the model on our own account. It seems feasible to implement for developers, but there may be a cost involved to use it on the site. With Replicate(fixed period) or Llama API, it can be provided for free. What do you think?

https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI

sjdthree commented 11 months ago

@sjdthree I researched Huggingface and was able to get it up and running easily. However, to run Llama2 as an API on Huggingface, we need to host the model on our own account. It seems feasible to implement for developers, but there may be a cost involved to use it on the site. With Replicate(fixed period) or Llama API, it can be provided for free. What do you think?

https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI

Yes, I like it!

I see Replicate.com as similar to Huggingface with a limited free tier then pay-for-speed / performance, etc. Is there a tier difference I'm missing?

I would strongly support options! so all three: hf, replicate and llama api.

How best to architect to handle these?

miurla commented 11 months ago

see Replicate.com as similar to Huggingface with a limited free tier then pay-for-speed / performance, etc. Is there a tier difference I'm missing?

I have no differences in understanding.

I would strongly support options! so all three: hf, replicate and llama api.

Let's support multiple options. On our demo page, we should enable trying Llama2 with Replicate.

How best to architect to handle these?

The LLM calls are made using LangChain, and both HF and Replicate are supported. The selected model from the UI is passed as modelName in all calls. It seems like creating a function that returns an LLM instance based on the modelName would be a good idea. First, replacing just BabyElfAGI should be sufficient.

Code
- e.g. https://github.com/miurla/babyagi-ui/blob/main/src/agents/babyelfagi/skills/skill.ts#L103-L125
LangChain
- Replicate: https://js.langchain.com/docs/api/llms_replicate/classes/Replicate
- HF: https://js.langchain.com/docs/api/llms_hf/classes/HuggingFaceInference

sjdthree commented 11 months ago

This sounds good.

Where would we change the drop-down on the front page?

miurla commented 11 months ago

Thank you for checking it right away. It's defined in /src/utils/constants.ts. https://github.com/miurla/babyagi-ui/blob/main/src/utils/constants.ts#L8-L20

The name here is used as a display name. In the LLM invocation, the modelName is passed the id.

sjdthree commented 11 months ago

Ok sounds good. Did you want me to make the changes and post a PR for your review?

miurla commented 11 months ago

I'm glad to hear that! It would be extremely helpful! Can you please try it once? I’ll support you anytime.

miurla / babyagi-ui

Support Huggingface Inference API #140