Closed Godlikemandyy closed 2 years ago
Hi @Godlikemandyy, and thanks for your interest in this library. Without having access to the original PDF, or the code you used, it is difficult to answer your question. But I would suggest the following:
stroking_color
and non_stroking_color
of the objects in page.chars
print(page.filter(lambda obj: not (
obj["object_type"] == "char"
and obj["non_stroking_color"] == "..." # Replace "..." with the value determined in the previous step
).extract_text())
Because this is a specific-PDF troubleshooting question, rather than a bug or feature request, I'm closing this issue. Feel free to continue the discussion here, or through a new troubleshooting Discussion: https://github.com/jsvine/pdfplumber/discussions/categories/get-help-with-specific-pdfs
After reading the PDF file, one blue word has changed into two identical words,eg: "任何单位和个人" become "任任何何单单位为和和个个人人" What caused this and how to fix it!
Thank you