opendatalab / PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction
GNU Affero General Public License v3.0
4.72k stars 319 forks source link

运行表格识别DEMO报如下错误 #69

Open kevinwei1975 opened 1 month ago

kevinwei1975 commented 1 month ago

https://github.com/opendatalab/PDF-Extract-Kit/blob/main/demo/TableRec/StructEqTable/Table_Recognition.ipynb 运行上述代码报错: 连续遇到三个问题, 第一个问题: from modelscope import snapshot_download model_dir = snapshot_download('wanderkid/PDF-Extract-Kit') 运行上述两行代码报错,说是缺少config.json这个文件,但是模型已经下载到本地了。 这个问题我没解决但是我发现,模型已经下载到本地了,我就用把路径添加到了代码即这行代码, model = build_model("C:/Users/kevin/.cache/modelscope/hub/wanderkid/PDF-Extract-Kit/models/TabRec/StructEqTable", max_new_tokens=4096, max_time=60) 他可以直接生成模型实例。 但是执行这行代码时报有病毒,提示如下: 木马:HEUR/QVM202.0.0DE9.Malware.Gen C:\Users\kevin\AppData\Local\Pandoc\pandoc.exe 如果忽略这个病毒,报如下错误, RuntimeError: Pandoc died with exitcode "65" during conversion: Error at "source" (line 1, column 90): unexpected end of input expecting \end{tabular} \begin{tabular}{|l|l|c|c|c|} \hline Label & Co-occurrence & Bonferroni & FDR & Color \ 即这行代码报错:markdown_code = convert_text(latex_code, 'md', format='latex')

wangbinDL commented 1 month ago

@sky-fly97 Please take a look at this issue.

sky-fly97 commented 1 month ago

It's possible that you're running longer than the 60s maximum inference time you've set, and that's why the latex code is incomplete, so you can try increasing the max_time.