Vision Q/A on batches of images or video, and use video as vision model together with non-vision LLM model

Add ability to ingest any number of images, video, url images, url video, or youtube video for vision models
Control over resolution changes, image format, how many frames of video used
Control over prompts used for batching of images
Allow visible_vision_models as separate model or entry in model lock that is vision model paired with non-vision model to allow maximal intelligence on visual Q/A. Allow images_num_max in CLI or UI or API or model_lock to control how many images before batch.
Clean-up linux_install.sh a bit
Add support for gpt-4o and gpt-4-turbo for vision Q/A, up to 20 images per LLM call
Fix forking of Parallel to avoid too many forks on high core systems

Next PR:

Choice to do Parallel LLM calls in batching (might overload OSS models or hit rate limits for closed models)
Fix fiftyone for vision Q/A

h2oai / h2ogpt