Closed CaptainPalapa closed 3 months ago
I read that the file is automatically closed when the processor gets out of scope, but... not the case?
The file is automatically closed when the object is garbage collected/finalized. Python, unlike Rust, does not have deterministic memory management. There can be an arbitrary delay from reaching refcount 0 to being collected/finalized.
So, explicitly closing the PdfDocument might fix the issue (try: ... finally: pdf.close()
). Also make sure you don't have any other dangling handles to the file beside the PdfDocument.
For the "confirm not a build issue", I can't really confirm that. I'm really new to python, maybe I don't understand. Package is from: pip install pypdfium2
This simply means I would have intended you to use the PyPA issue template, not the generic one. Virtually everyone seems to do this wrong, so I suppose it is just a bit too confusing. The PyPA template merely has a few diagnostic commands to identify the pypdfium2, python and OS versions used. Anyway, I think that was not relevant here.
Thank you @mara004 This exactly solved my problem! On my five year old dev machine, I can grab my first available PDF, extract the text to a new.txt file and move the pdf to a /processed folder in as low as 12ms. Woot. Thanks!
I should also let you know that I tried four other PDF libs before I came across yours, but none of those would extract the text in the correct order, for some reason. Thanks for the great work on getting it correct! 😄
Checklist
Reason for Generic issue (keyword/topic)
File does not get closed
Description
For the "confirm not a build issue", I can't really confirm that. I'm really new to python, maybe I don't understand. Package is from:
pip install pypdfium2
Here is the code:
After this function completes, I attempt to move the file from a /incoming to a /processed folder, but I get:
I read that the file is automatically closed when the processor gets out of scope, but... not the case?