Update specification for docs

This PR should prove useful for the ongoing work of generating documentation pages based on the input/output specs (see https://github.com/huggingface/hub-docs/pull/1379).

This PR is now ready for review.

Changes:

use enums instead of oneOf + list of const
do not rely on "$ref": "/inference/schemas/text2text-generation/input.json", for Summarization / Translation. Makes things clearer + it's not possible to extend the parameters which was not possible before.
typo in text-to-image
add src_lang and tgt_lang in translation params
use enum for early_stopping parameter (in common defs)
for audio-classification, automatic speech recognition, image classification, image to image, object detection:
- mention base64-encoded string as input
- mention raw data can be sent if no parameters in the JSON payload
more descriptions in object detection
more descriptions in image segmentation
mention bytes output in text-to-image and image-to-image

huggingface / huggingface.js