kubeagi / core-library

Core library for kubeagi to provide apis&sdk in python
Apache License 2.0
3 stars 3 forks source link

Use the tabula-py the extract table from table. #44

Closed ggservice007 closed 7 months ago

ggservice007 commented 7 months ago

what

Use the tabula-py the extract table from table.

github

https://github.com/chezou/tabula-py

code

def get_table_tabula_py():
    import tabula 

    print("使用tabula来提取表格")
    filename = "财务报销管理细则-V1.00-202201.pdf"
    dfs = tabula.read_pdf(filename, pages='all')
    for i in range(len(dfs)):
        df = dfs[i]
        print(f"第{i + 1}表格")
        print(df)
        print("\n")

    print('处理结束')

if __name__ == '__main__':
    get_table_tabula_py()

result

image

conclusion

The effect is not very satisfactory.

related issues

42

bjwswang commented 7 months ago

@ggservice007 Please track in same issue with another comments