[Questions] In-Context-Learning for Batch Inference 上下文学习怎么批量推理？

fisher75 commented 8 months ago

我看这个例子中的questions的形式类似于zero-shot，请问我要做上下文学习有没有合适的例子呢？也就是我先给模型在之前的对话中提供多张图片并且分别告诉他们图片的含义，之后再让他推理另外的图片。这个功能怎么实现呢？

I think the form of the questions in this example is similar to zero-shot. Is there any suitable example for context learning? That is, I first provide the model with multiple pictures in the previous conversation and tell them the meaning of the pictures, and then let it reason about other pictures. How to implement this function?

hnyls2002 commented 8 months ago

@fisher75 请问你具体举的例子是哪一个，如果你想说的是在vision model（比如llava）中给定few shot学习的对象是图片并且利用这个作为sharing的context prefix，后面又添加另外的图片来推理的话，现在这些vision model应该是不支持的。因为现在只支持一张图片作为输入。

fisher75 commented 8 months ago

@fisher75 请问你具体举的例子是哪一个，如果你想说的是在vision model（比如llava）中给定few shot学习的对象是图片并且利用这个作为sharing的context prefix，后面又添加另外的图片来推理的话，现在这些vision model应该是不支持的。因为现在只支持一张图片作为输入。

谢谢解答，那我不通过图片的ICL而通过多轮对话可以吗？就是最后还是一张图片，但是多轮对话引导他CoT这种思维链可行吗？假如可行的话有没有例子呢？应该是什么格式？

fisher75 commented 7 months ago

我想利用CoT的能力，ICL这样大概结构应该是：图+问题 -> 问题 -> 问题。那这种情况下我的那个questions.jsonl应该怎么写呢

hnyls2002 commented 7 months ago

@fisher75 你可以参考这个tree_of_thought的benchmark

https://github.com/sgl-project/sglang/blob/cb389c91bcff6ffac4a95a0551a05d67e21ba306/benchmark/tree_of_thought_deep/bench_sglang.py#L41-L70

image的API直接用 https://github.com/sgl-project/sglang/blob/cb389c91bcff6ffac4a95a0551a05d67e21ba306/examples/quick_start/srt_example_llava.py#L7-L10

fisher75 commented 7 months ago

@fisher75 你可以参考这个tree_of_thought的benchmark

https://github.com/sgl-project/sglang/blob/cb389c91bcff6ffac4a95a0551a05d67e21ba306/benchmark/tree_of_thought_deep/bench_sglang.py#L41-L70

image的API直接用

https://github.com/sgl-project/sglang/blob/cb389c91bcff6ffac4a95a0551a05d67e21ba306/examples/quick_start/srt_example_llava.py#L7-L10

哈喽，感谢回复哈。不过我看了下这个应该不是我想要的。SGlang是支持历史多轮对话的，第一个代码主要是做并行深度思考的，测试集里面全是数学问题，第二个代码中支持的是多轮图片对话。您可能以为我想对多张图片进行深度推理，但是我的想法只是想要让一个模型可以（1）在多轮对话中可以处理一张图片，能够一直有这一张图片的视野，然后一步步引导他获得正确的答案的一种CoT；（2）或者多张图片给他例子，相当于一种ICL，从而达到prompt实现CoT，关键是有批量化推理CoT的能力。可能是llava还没有开发出来这个功能。请问我想要的这两个功能现在llava或者sglang拥有吗？大概应该怎么实现呢？非常感谢！

hnyls2002 commented 7 months ago

@fisher75

在多轮对话中一直拥有一张图片的视野，直接最开始放一张图片，然后利用fork和+=就可以了，我们会自动share这个图片的prefix。
多张图片目前不支持。

fisher75 commented 7 months ago

@fisher75

在多轮对话中一直拥有一张图片的视野，直接最开始放一张图片，然后利用fork和+=就可以了，我们会自动share这个图片的prefix。

多张图片目前不支持。

哦哦没事儿了，今天这个feature已经加进去了，如果需要的话我可以给你们pull

github-actions[bot] commented 3 months ago

This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.

zxy110 commented 3 months ago

@fisher75

在多轮对话中一直拥有一张图片的视野，直接最开始放一张图片，然后利用fork和+=就可以了，我们会自动share这个图片的prefix。

多张图片目前不支持。

哦哦没事儿了，今天这个feature已经加进去了，如果需要的话我可以给你们pull

您好！请问你是怎么实现多张图像输入的呀，是将多张图像和文本构造ICL后输入MLLM吗？

sgl-project / sglang

[Questions] In-Context-Learning for Batch Inference 上下文学习怎么批量推理？ #320