Add support for HuggingFace 🤗 inference API

brainstorm commented 4 days ago

Checklist

[ ] Closing issues: #issue
[x] Mark this if you consider it ready to merge
[ ] I've added tests (optional)
[ ] I wrote some documentation

Description

I have been playing with pseudo-C (pdc) decompilation for a STM8 codebase on the HuggingFace web-based Chat, for free:

Unfortunately, when hitting the API endpoint with the same model (or different ones) I'm hitting this:

[0x0000807f]> s 0x0000833c
[0x0000833c]> decai -d
{"error":"Model requires a Pro subscription; check out hf.co/pricing to learn more. Make sure to include your HF token in your query."}

On the latter I'm using the same Bearer token I'm using for the chat, so I'm not entirely sure what's going on? Free tier limits are a bit vague anyway, according to this forum thread: https://discuss.huggingface.co/t/api-limits-on-free-inference-api/57711/5

Also getting strange (capacity?) errors for smaller models:

[0x0000833c]> decai -d
{"error":"Model meta-llama/Llama-3.2-1B-Instruct is currently loading","estimated_time":98.86515045166016}

In any case, I hope this addition helps folks that do pay for this service?

brainstorm commented 4 days ago

Or perhaps there's a way to somehow hit the HuggingFace Chat API (https://huggingface.co/docs/text-generation-inference ??) instead of the arguably more official Serverless Inference API? 🤔

trufae commented 4 days ago

looks good to me! the huggingface apiname is a bit long. so i would suggest to use "hf" instead as an alias,. and yeah i guess its possible to hit this endpoint without any apikey.. but not sure if we want to play dirty with them :D

radareorg / r2ai

Add support for HuggingFace 🤗 inference API #65