运行的结果没有示例图中好

onefish51 commented 1 year ago

非常棒的工作，标注效果相比Blip的有了很大的提升！nice！

ram_grounded_sam 主业的这张图中RAM的结果中如你展示和提醒的是有lamp和door标签的，但是我跑出来的结果中却没有是什么原因导致的呢？

Coler1994 commented 1 year ago

demo为了保证准确率，调高了阈值，牺牲了些召回， grounded sam的pipeline由于有grounding dino兜底，阈值会偏低些。我们在精细的调调每个类的阈值。

onefish51 commented 1 year ago

是model.threshold由0.68降到了0.64？我刚才改了但是好像没起到作用。还是其他哪个参数？谢谢

cpperrpr commented 1 year ago

你好我运行测试命令的时候报错，请问您有遇到吗：python inference_tag2text.py --image 042.jpg --pretrained tag2text_swin_14m.pth 报错： magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, 'v'.

onefish51 commented 1 year ago

你好我运行测试命令的时候报错，请问您有遇到吗：python inference_tag2text.py --image 042.jpg --pretrained tag2text_swin_14m.pth 报错： magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, 'v'.

你的这个报错我没有遇到，我倒是遇到了另一个报错

Traceback (most recent call last):
  File "inference_tag2text.py", line 94, in <module>
    res = inference(image, model, args.specified_tags)
  File "inference_tag2text.py", line 43, in inference
    caption, tag_predict = model.generate(image,
  File "/data2/home/tyu/stable_diffusion/promt_gen/Recognize_Anything-Tag2Text/models/tag2text.py", line 364, in generate
    torch.sigmoid(logits) > self.class_threshold,
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

因为我是在显卡上跑的，所以遇到这样的报错，可以通过将对应的代码 https://github.com/xinyu1205/Recognize_Anything-Tag2Text/blob/ffd1a283caea70ab8436645c0fd0f366ae7de3f8/models/tag2text.py#L364

修改为

torch.sigmoid(logits) > self.class_threshold.to(image.device),

就行了，小问题 @Coler1994 @xinyu1205

majinyu666 commented 1 year ago

是model.threshold由0.68降到了0.64？我刚才改了但是好像没起到作用。还是其他哪个参数？谢谢

应该只是阈值问题，我这儿降到0.63能出lamp，door还要更低些

cpperrpr commented 1 year ago

你好我运行测试命令的时候报错，请问您有遇到吗：python inference_tag2text.py --image 042.jpg --pretrained tag2text_swin_14m.pth 报错： magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, 'v'.

你的这个报错我没有遇到，我倒是遇到了另一个报错
Traceback (most recent call last):
  File "inference_tag2text.py", line 94, in <module>
    res = inference(image, model, args.specified_tags)
  File "inference_tag2text.py", line 43, in inference
    caption, tag_predict = model.generate(image,
  File "/data2/home/tyu/stable_diffusion/promt_gen/Recognize_Anything-Tag2Text/models/tag2text.py", line 364, in generate
    torch.sigmoid(logits) > self.class_threshold,
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
因为我是在显卡上跑的，所以遇到这样的报错，可以通过将对应的代码

https://github.com/xinyu1205/Recognize_Anything-Tag2Text/blob/ffd1a283caea70ab8436645c0fd0f366ae7de3f8/models/tag2text.py#L364

修改为
torch.sigmoid(logits) > self.class_threshold.to(image.device),
就行了，小问题 @Coler1994 @xinyu1205

谢谢，发现问题了是模型文件没clone好，谢谢你的回复

xinyu1205 commented 1 year ago

你好我运行测试命令的时候报错，请问您有遇到吗：python inference_tag2text.py --image 042.jpg --pretrained tag2text_swin_14m.pth 报错： magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, 'v'.

你的这个报错我没有遇到，我倒是遇到了另一个报错
Traceback (most recent call last):
  File "inference_tag2text.py", line 94, in <module>
    res = inference(image, model, args.specified_tags)
  File "inference_tag2text.py", line 43, in inference
    caption, tag_predict = model.generate(image,
  File "/data2/home/tyu/stable_diffusion/promt_gen/Recognize_Anything-Tag2Text/models/tag2text.py", line 364, in generate
    torch.sigmoid(logits) > self.class_threshold,
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
因为我是在显卡上跑的，所以遇到这样的报错，可以通过将对应的代码

https://github.com/xinyu1205/Recognize_Anything-Tag2Text/blob/ffd1a283caea70ab8436645c0fd0f366ae7de3f8/models/tag2text.py#L364

修改为
torch.sigmoid(logits) > self.class_threshold.to(image.device),
就行了，小问题 @Coler1994 @xinyu1205

感谢你非常有价值的bug反馈，我已经修改对应的代码~

xinyu1205 / recognize-anything

运行的结果没有示例图中好 #14