huggingface / huggingface_hub

The official Python client for the Huggingface Hub.
https://huggingface.co/docs/huggingface_hub
Apache License 2.0
1.87k stars 487 forks source link

Unable to pass `no_repeat_ngram_size` in `text_generation` #2022

Open satkg42 opened 5 months ago

satkg42 commented 5 months ago

Describe the bug

I am using InferenceClient for generating text with tgi endpoint. To control the repetitive generations we had to use no_repeat_ngram_size parameter. But I am getting below error TypeError: InferenceClient.text_generation() got an unexpected keyword argument 'no_repeat_ngram_size'

But it is supported in GenerationConfig

Reproduction

No response

Logs

TypeError: InferenceClient.text_generation() got an unexpected keyword argument 'no_repeat_ngram_size'
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/gradio/queueing.py", line 489, in call_prediction
    output = await route_utils.call_process_api(
  File "/opt/conda/lib/python3.10/site-packages/gradio/route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 1561, in process_api
    result = await self.call_function(
  File "/opt/conda/lib/python3.10/site-packages/gradio/blocks.py", line 1191, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/opt/conda/lib/python3.10/site-packages/gradio/utils.py", line 519, in async_iteration
    return await iterator.__anext__()
  File "/opt/conda/lib/python3.10/site-packages/gradio/utils.py", line 640, in asyncgen_wrapper
    async for response in f(*args, **kwargs):
  File "/opt/conda/lib/python3.10/site-packages/gradio/chat_interface.py", line 481, in _stream_fn
    first_response = await async_iteration(generator)
  File "/opt/conda/lib/python3.10/site-packages/gradio/utils.py", line 519, in async_iteration
    return await iterator.__anext__()
  File "/opt/conda/lib/python3.10/site-packages/gradio/utils.py", line 512, in __anext__
    return await anyio.to_thread.run_sync(
  File "/opt/conda/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
  File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/opt/conda/lib/python3.10/site-packages/gradio/utils.py", line 495, in run_sync_iterator_async
    return next(iterator)
  File "/app/app.py", line 56, in generate
    for token in client.text_generation(
TypeError: InferenceClient.text_generation() got an unexpected keyword argument 'no_repeat_ngram_size'

System info

- huggingface_hub version: 0.20.3                                                                                                               
- Platform: Linux-5.15.0-92-generic-x86_64-with-glibc2.31                                                                                       
- Python version: 3.10.13
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /user/.cache/huggingface/token
- Has saved token ?: True
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.1.1
- Jinja2: 3.1.3
- Graphviz: N/A
- Pydot: N/A
- Pillow: 10.2.0
- hf_transfer: 0.1.5
- gradio: 4.12.0
- tensorboard: N/A
- numpy: 1.26.3
- pydantic: 2.6.1
- aiohttp: 3.9.3
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /data
- HF_ASSETS_CACHE: /user/.cache/huggingface/assets
- HF_TOKEN_PATH: /user/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None 
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: True
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
Wauplin commented 5 months ago

Hi @satkg42, the no_repeat_ngram_size is not a valid parameter on a TGI server. See the docs for a full list of available parameters. The link you've shared above is for transformers pipeline. For the record, TGI server and transformers library do not share the same codebase. We are on the ongoing process to unify the API (cc @SBrandeis) but parameters with low usage (like no_repeat_ngram_size) will most likely not be supported. As a consequence, it is not supported in Python's InferenceClient.text_generation.

satkg42 commented 5 months ago

Thanks @Wauplin for quick response. I agree that it might not be one of the most used parameters. But in my experience it definitely helps model "rambling" problem, i.e. model going on and on about something. It would be really helpful if this parameter is supported in the API. Let me know if I can contribute in doing so.

Wauplin commented 5 months ago

Thanks for the details @satkg42. For now, I would prefer to delay the decision. Adding support for new parameters is not the hardest part. Maintaining them and adding backward compatibility when we want to update/remove them is much harder. That's why I'd rather wait until the "API unification" step is done on our part before adding this. In the meantime, I encourage any user landing on this page to post a comment to show their interest in such an addition.

@satkg42 a possible workaround for you in the meantime is to use the .post method. This is more manual work but would do the same for you:

from huggingface_hub import InferenceClient

client = InferenceClient(...)

response = client.post(json={"inputs": "this is my test", "parameters": {"no_repeat_ngram_size": 42}}
data = json.loads(response.decode())
...

However, this is not compatible with TGI-served models.