Closed metaist closed 1 year ago
Using ChatGTP 4.0 / CodeInterpreter
ChatGPT: The error message indicates that there is an issue with the file object being used. The OSError: [Errno 29] Illegal seek
typically occurs when you try to perform a seek operation on a file object that doesn't support seeking, like a file object opened in write mode ("w"
), or a network stream, for example.
The pdfmerge
function is trying to open a file in write mode with Path(output).open("wb") as stream:
and then trying to pass this to pypdf
, which is later trying to seek in this stream.
One possible solution to this problem is to make sure that the file object supports seeking. This could mean opening the file in a mode that supports seeking, like "rb+"
for reading and writing in binary mode, or using a different kind of file object that supports seeking.
However, this is more likely a bug in the pdfmerge
or pypdf
library. If you can't find a workaround, it may be worth reporting this issue to the maintainers of the library.
Also, as Python 3.11 is relatively new at the time of your issue, it could be that the library has not yet been fully updated to support this version of Python. You might want to try using an older version of Python to see if the issue persists.
Lastly, it's worth noting that the Python standard library has a shutil
module which includes a copyfileobj
function that can be used to copy the content of one file object to another. Depending on your specific needs, this might provide an alternative way to merge PDF files without relying on the pdfmerge
library.
metaist: I'm the author of pdfmerge and I'm trying to fix this error. The comment about a network stream, seems interesting because this seems to happen when merging PDFs that are mounted using rclone
. First, how can I detect if a stream is a network stream. Second, what are workarounds for dealing with a network stream? Should I just make a temp file and then move it when I'm done?
ChatGPT: Detecting whether a file object is a network stream can be challenging because Python's file objects don't have a built-in way to check this. In general, the best way to check whether a file is a network stream might be to attempt a seek()
operation and catch the OSError
if it is raised. Here's an example of how you could do this:
def is_seekable(fileobj):
try:
fileobj.seek(fileobj.tell())
return True
except OSError:
return False
In this function, we're checking whether we can seek to the current position in the file (which should always be a no-op if seeking is supported). If this raises an OSError
, we know that seeking is not supported, and we return False
.
For dealing with network streams, writing to a temporary file is a common workaround. In fact, Python has a built-in library for creating temporary files and directories that you could use: tempfile
. Here's an example of how you could do this:
import tempfile
def write_to_tempfile(fileobj):
with tempfile.NamedTemporaryFile(delete=False) as temp:
shutil.copyfileobj(fileobj, temp)
return temp.name
In this function, we're creating a temporary file, copying the contents of fileobj
into the temporary file, and then returning the name of the temporary file.
Once you have the temporary file, you can perform whatever operations you need on it (like seeking), and then move it to its final location when you're done. Just be sure to delete the temporary file when you're done with it to avoid wasting disk space. Python's os.remove()
function can be used to delete files.
Of course, this is just one possible solution. Depending on your specific needs and the characteristics of your network stream, there might be other, more efficient ways to handle this situation.
OS: macOS 13.4.1 (22F82) Python: 3.11
When running
pdfmerge
on python 3.11, the following error occurs:Looks like something is wrong with the way the steam is constructed and passed to
pypdf
.