jlsutherland / doc2text

Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.
MIT License
1.27k stars 97 forks source link

Does is support stream data ? #32

Open multinucliated opened 4 years ago

multinucliated commented 4 years ago

I'm having a flask app which gets the file from the api and want to get the text out of it , but i don't want to save it on the disk . Is there any way ? I'm trying to push the stream object so its giving me the error.

code : file = request.files['file'] file_data = file.stream.read()

error:

\venv\lib\site-packages\docx2txt\docx2txt.py", line 76, in process zipf = zipfile.ZipFile(docx) File "C:\ProgramData\Anaconda3\lib\zipfile.py", line 1225, in init self._RealGetContents() File "C:\ProgramData\Anaconda3\lib\zipfile.py", line 1288, in _RealGetContents endrec = _EndRecData(fp) File "C:\ProgramData\Anaconda3\lib\zipfile.py", line 259, in _EndRecData fpin.seek(0, 2) AttributeError: 'bytes' object has no attribute 'seek'