Closed playgithub closed 4 years ago
Hi @playgithub For page 277, my recommendation to you would be to use explicit vertical lines. You may optionally crop the page like
page = page.crop((0, 0.26*float(page.height), page.width, 0.75*float(page.height))) # Remove the top 26% and bottom 25%.
to get better results.
The table settings using explicit lines would look like
{
"vertical_strategy": "explicit",
"horizontal_strategy": "text",
"explicit_vertical_lines": [Decimal(p.width) * Decimal('0.12'), Decimal(p.width) * Decimal('0.55'), Decimal(p.width) * Decimal('0.75'), Decimal(p.width) * Decimal('0.95')],
"intersection_x_tolerance": 20,
}
The explicit_vertical_lines
is a list of coordinate values for lines at the 12%, 55%, 75% and 95% mark.
Closing this issue as well. Feel free to reopen if any further queries.
I'm sure it'll work for the specific page, but there are many tables to extract, assistant coordinates can't help.
pdf
http://www.cninfo.com.cn/new/disclosure/detail?plate=sse&orgId=9900005970&stockCode=601668&announcementId=1207611180&announcementTime=2020-04-25 (click the button on the top right to download the pdf, which has a download icon)
page
277
code
result