lxj0276 / tableDetect

detect the table image in pdf or other format image by opencv and python .
53 stars 19 forks source link

Data extraction of table #3

Open architGitHub opened 5 years ago

architGitHub commented 5 years ago

I have detected the table using your code. Now I want to use the mask to extract the data from the table. Any idea on how to do this job?

lxj0276 commented 5 years ago

I have detected the table using your code. Now I want to use the mask to extract the data from the table. Any idea on how to do this job?

you can find the anchors of table,then you can find the roi of text region.Processing the text region what you can do

architGitHub commented 5 years ago

Thanks for your reply I am new to CV and do not know much about it. Could you please guide me on how to do this (any link would also be helpful)?

lhlcc726 commented 4 years ago

大佬,我想入微信群,给个微信号吧。我的微信是13760862542.看了你的代码,非常不错。貌似你用两种方式搞定表格提取对吗,第一个class,为什么没image_to_string把OCR顺便也做了呢。有很多问题想请教。例如上述两种class的效率对比如何?第一个class中areaRange该如何设置?用软件先弄出每个单元格的大概面积?其实这个项目再加上一头一尾就完美了,"头"就是把多页的PDF转为png,“尾”就是OCR识别中文和数字,输出到EXCELL。另外考虑到处理效率。