kubeagi / core-library

Core library for kubeagi to provide apis&sdk in python
Apache License 2.0
3 stars 3 forks source link

Use the pdfplumber the extract table from table. #43

Open ggservice007 opened 7 months ago

ggservice007 commented 7 months ago

what

Use the pdfplumber the extract table from table.

result

img_v3_028t_c2cd8ef4-e33d-44da-ad32-c98a4a22346g

conclusion

The effect is not very satisfactory.

related issues

42

bjwswang commented 7 months ago

@ggservice007 please create a pr which implements image_from_pdf

ggservice007 commented 7 months ago

@bjwswang

verify code

def get_table():
    import pdfplumber

    # 打开PDF文件
    with pdfplumber.open("财务报销管理细则-V1.00-202201.pdf") as pdf:
        # 遍历每一页
        for page in pdf.pages:
            # 提取表格,这里返回的是一个表格对象列表
            tables = page.extract_tables()

            # 遍历每个表格
            for table in tables:
                print('=' * 80)
                # 遍历每个单元格
                for row in table:
                    print(row) 

if __name__ == '__main__':
      get_table()

I try to other library now.