Closed ImKeTT closed 4 months ago
@ImKeTT Could we make the default Prometheus for now so use _get_prometheus_vision_critique_metric_specs()
?
I changed metrics for all four open-ended VLM generation taks (Bingo, Flickr30k, Crossmodal 3600, Vibe-Eval) to _get_prometheus_vision_critique_metric_specs()
. The credentials.conf
is like:
critiqueModelName: huggingface/prometheus-vision-13b-v1.0-hf
critiqueType: model
I set the max_tokens
for the Prometheus Vision to 200, or the rating might be truncated out.
I've tested 100 instances for Bingo and Vibe-Eval, the results look good. Take a look when you have time, thanks @teetone!
max_tokens
in the Unicorn scenario to 1, the results look fine now.Crossmodal_3600, Bingo, Flickr30k
, to make VLMs generate more aligned answers with the reference, I have tested 50 instances on these three scenarios (model=model=openai/gpt-4-vision-preview
with_get_vibe_eval_critique_metric_specs()
), and the results look good._get_vibe_eval_critique_metric_specs()
, which can also be easily changed to_get_prometheus_vision_critique_metric_specs()
as they do almost the same job.@teetone would you take a look? And please let me know how I can improve it. Thanks!