请求这个怎么与qwen_vl结合使用呢？

We will feed the images from our evaluation dataset through our OCR model to extract the corresponding textual output. Subsequently, this extracted text will be incorporated as supplementary data, co-inputted into the qwen_vl model, thereby offering it directional cues.

请教一下：发现这样使用，大模型的ocr类回答会受限制于提示词中ocr的文字，如何让大模型能够提升ocr的准确率呢，例如在ocr结果明显错误的情况下，大模型也可以修正结果；要做到这个，是否应该去拿你们的数据集基于open-vl全参微调，不知道你们有没有做过相关尝试

large-ocr-model / large-ocr-model.github.io

请求这个怎么与qwen_vl结合使用呢？ #7