Open zhangjiekui opened 3 months ago
We will feed the images from our evaluation dataset through our
OCR
model to extract the corresponding textual output. Subsequently, this extracted text will be incorporated as supplementary data, co-inputted into theqwen_vl
model, thereby offering it directional cues.
请教一下:发现这样使用,大模型的ocr类回答会受限制于提示词中ocr的文字,如何让大模型能够提升ocr的准确率呢,例如在ocr结果明显错误的情况下,大模型也可以修正结果;要做到这个,是否应该去拿你们的数据集基于open-vl全参微调,不知道你们有没有做过相关尝试
We will feed the images from our evaluation dataset through our
OCR
model to extract the corresponding textual output. Subsequently, this extracted text will be incorporated as supplementary data, co-inputted into theqwen_vl
model, thereby offering it directional cues.