Closed yts2020 closed 2 years ago
Hi @yts2020 Appreciate your interest in the library. Could you please update the issue with the details as mentioned in https://github.com/jsvine/pdfplumber/blob/develop/.github/ISSUE_TEMPLATE/bug-report.md?
Hi @samkit-jain
my code on macos is ok, but on centos7 is None.
this is my pdf file. ali001.pdf
pdfplumber==0.5.28
my code
import pdfplumber
import re
path = '/tmp/ali001.pdf'
pdf = pdfplumber.open(path)
for page in pdf.pages[:2]:
print(page.extract_text())
for pdf_table in page.extract_tables():
print(pdf_table)
table = []
cells = []
for row in pdf_table:
if not any(row):
if any(cells):
table.append(cells)
cells = []
elif all(row):
if any(cells):
table.append(cells)
cells = []
table.append(row)
else:
if len(cells) == 0:
cells = row
else:
for i in range(len(row)):
if row[i] is not None:
cells[i] = row[i] if cells[i] is None else cells[i] + row[i]
for row in table:
print([re.sub('\s+', '', cell) if cell is not None else None for cell in row])
print('---------- ----------')
pdf.close()
run result
its pdf version or font ?
how i solve
@feikongl I too am unable to extract the text on my Ubuntu 18.04 machine. The PDF contains the font STSONG and I think it is a duplicate of https://github.com/jsvine/pdfplumber/issues/332. If you are able to run on MacOS, it could be that it contains the font and is able to map correctly. CentOS 7 may not have the font and is unable to map. I haven't tried but I found the following and it might be of help to you
If you find any other solution that helped you install the font and resolve the issue, please share here.
@samkit-jain thank you. i will try.
When I run page.extract_table(),the result is [[''], [''], [''], ['']]?