unoconv / unoserver

MIT License
598 stars 81 forks source link

Considerations regarding output buffers #38

Open mara004 opened 2 years ago

mara004 commented 2 years ago

While studying the converter code, I wanted to share some thoughts on output buffers: Currently, if one does not want to write to a file, it's possible to set outpath to None. In this case, the data will be written into a new, internally created io.BytesIO object, and then the value of the buffer is returned as bytes.

However, that means the entire data will be in memory at the same time, which increases resource usage. Looking at the uno outputstream interface, it seems like the data is actually provided incrementally (presuming that writeBytes() is called multiple times with smaller parts rather than one large sequence):

class OutputStream(unohelper.Base, XOutputStream):
    def __init__(self):
        self.buffer = io.BytesIO()

    def closeOutput(self):
        pass

    def writeBytes(self, seq):
        self.buffer.write(seq.value)

If unoserver would accept a caller-provided output buffer to write into (e. g. a file handle acquired by open(..., "wb"), or sys.stdout), the data wouldn't necessarily have to be in memory at once.

For a possible backwards-compatible implementation, outpath could just be adapted to accept a byte buffer (i. e. anything that implements write(), read(), and seek()), and an init parameter could be added to OutputStream to take it over.

I'm not certain how useful this would be, given that uno can already write to files on its own, and if you intend to post-process the output, it probably needs to be in memory as a whole anyway. Nevertheless, it seems a bit more elegant (e. g. in case callers want to handle file writing on their own for some reason). Would you be interested in a Pull Request?

regebro commented 2 years ago

Yes, PDF's will be sent one page at a time, for example, so that could be a useful Pull Request.