haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.1k stars 2.21k forks source link

[Usage] The deterministic mode did not set in eval_model() function #1013

Open y-vectorfield opened 9 months ago

y-vectorfield commented 9 months ago

Describe the issue

Issue: The internal parameter for deterministic mode did not set in eval_model() func

According to the example code for eval_model(), temperature parameter set 0.

model_path = "liuhaotian/llava-v1.5-7b"
prompt = "What are the things I should be cautious about when I visit here?"
image_file = "https://llava-vl.github.io/static/images/view.jpg"

args = type('Args', (), {
    "model_path": model_path,
    "model_base": None,
    "model_name": get_model_name_from_path(model_path),
    "query": prompt,
    "conv_mode": None,
    "image_file": image_file,
    "sep": ",",
    "temperature": 0,
    "top_p": None,
    "num_beams": 1,
    "max_new_tokens": 512
})()

eval_model(args)

I think if we set this param 0, we should explicitly set these additional params.

torch.use_deterministic_algorithms = True
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

Hence, we should add the following conditional execution in eval_model() func.


if args.temperature == 0:
  torch.use_deterministic_algorithms = True
  torch.backends.cudnn.deterministic = True
  torch.backends.cudnn.benchmark = False
else:
  torch.use_deterministic_algorithms = False
  torch.backends.cudnn.deterministic = False
  torch.backends.cudnn.benchmark = True
OliverXUZY commented 4 months ago

Hi, Thank you for bringing up this issue. I encountered a similar problem even after explicitly setting torch.backends.cudnn.deterministic and related flags. I've noticed that the discrepancies occur specifically in the CLIP ViT encoders, where the vision embeddings produce different values across separate runs.

When comparing two identical inference processes, I observed that the image_forward_out varies despite using the same image input. This occurs in the following file: https://github.com/haotian-liu/LLaVA/blob/c121f0432da27facab705978f83c4ada465e46fd/llava/model/multimodal_encoder/clip_encoder.py#L50

This situation occurs starting from the second example until the last one.

I'm curious to know if you're still experiencing this issue and if you've found a solution. Any insights would be greatly appreciated. Thank you for your time!