Closed youpengbo2018 closed 3 years ago
Hi @youpengbo2018 Appreciate your interest in the library. You can use explicit_vertical_lines
in combination with vertical_strategy=lines
to explicitly specify the coordinates of the vertical line separators. You can use the following table extraction strategy as an example
{
"vertical_strategy": "lines",
"horizontal_strategy": "lines",
"snap_tolerance": 7,
"explicit_vertical_lines": [Decimal(p.width) * Decimal('0.07'), Decimal(p.width) * Decimal('0.93')],
}
What I have done in the explicit_vertical_lines
is that I have specified coordinates of the first and last vertical line separators to be at 7% and 93% of the page's width. If the page was 100 units wide, the X coordinates for the first and last vertical line separators would be at 7 units and 93 units respectively.
Add from decimal import Decimal
at the top for importing Decimal
. The output will be
thank you! it works
2020.06月度数据(区).pdf this is the page that I want to extract., but the page doesn't has the vertical edge. I use extract_table‘s vertical_strategy:text to let the system find the edge. Finally, It can extract the data I want, but It also ignores the blank of other columns. I want to get the table with csv file which have the same look as the picture(the table need to show me the blank. this is my code: mport pdfplumber import pandas as pd if name == 'main': list = [] with pdfplumber.open(r'F:\work\南京\2020.06月度数据(区).pdf') as pdf: page = pdf.pages[8] for table in page.extract_tables(table_settings={"vertical_strategy": "lines", "horizontal_strategy": "lines","keep_blank_chars":"False"}): tb = pd.DataFrame(table[1:], columns=table[0], index=None) print(tb) tb.to_csv(r'F:\work\南京\南京\test3.csv', index=False)