可以根据AutoProcessor或者输入的message来确定image_start与image_end参数吗？

shikiw / OPERA

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

MIT License

290 stars 26 forks source link

可以根据AutoProcessor或者输入的message来确定image_start与image_end参数吗？ #47

Open Tian-ye1214 opened 1 week ago

Tian-ye1214 commented 1 week ago

作者您好，感谢您的工作！

是否可以根据transformer库的AutoProcessor确定image_start等参数？例如我的输入是 messages = [ { "role": "user", "content": [ {"type": "image", "image": "", }, {"type": "text", "text": "Please describe this image in detail."}, ], } ] AutoProcessor的输出是： ['<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>Please describe this image in detail.<|im_end|>\n<|im_start|>assistant\n'] 这是否意味着我的image_start=11,image_end=13。如果不是，我应该如何根据AutoModel函数来确定image_start等参数？

感谢您的回复！

Tian-ye1214 commented 1 week ago

另外，如果按照image_start=11,image_end=13的设置，会发生报错“ValueError: max_length needs to be a stopping_criteria for now.” :(

shikiw commented 1 week ago

您好，感谢您对我们工作的认可！

image_start，image_end不是指special token的位置，例如在你给的例子中，image_start，image_end分别指第一个<|image_pad|>和最后一个<|image_pad|>的位置。可以参考https://github.com/shikiw/OPERA/issues/2
可以提供一下报错的位置吗？感谢

Tian-ye1214 commented 1 week ago

您好，感谢您对我们工作的认可！

image_start，image_end不是指special token的位置，例如在你给的例子中，image_start，image_end分别指第一个<|image_pad|>和最后一个<|image_pad|>的位置。可以参考Questions about the IM_START and IM_END tokens #2

可以提供一下报错的位置吗？感谢

感谢您的回复！第一个问题我理解了，感谢！还有个小问题是response_start参数是AutoProcessor的总长度吗？第二个问题，报错发生在transformers库的utils.py第1628行：

        if stopping_criteria.max_length is None:
            raise ValueError("`max_length` needs to be a stopping_criteria for now.")

shikiw commented 2 days ago

您好，

response_start是模型回答开始的首个token的位置
你确认一下generate的参数里有没有设置max_length