Closed andrea-matsec closed 8 months ago
Describe the bug Strelka fails to serialize events. I believe this is happening only when there's a pdf_load_error but I'm not 100% certain.
pdf_load_error
Environment details
Steps to reproduce Unfortunately I can't share the file to reproduce this error, but this is the even that can't be serialized.
{ 'file': { 'depth': 0, 'flavors': { 'mime': ['application/pdf'] }, 'scanners': ['ScanEntropy', 'ScanExiftool', 'ScanOcr', 'ScanPdf', 'ScanYara'], 'size': 220710, 'tree': { 'node': '6d343886-98a2-4258-8d99-9b0be8d4f63a', 'root': '6d343886-98a2-4258-8d99-9b0be8d4f63a' } }, 'scan': { 'entropy': { 'elapsed': 0.000211, 'entropy': 7.9965489035155235 }, 'exiftool': { 'elapsed': 7.299652, 'sourcefile': '/dev/shm/tmpp5fr8i53', 'exiftoolversion': 12.6, 'filename': 'tmpp5fr8i53', 'directory': '/dev/shm', 'filesize': '221 kB', 'filemodifydate': '2024:01:03 22:49:24+00:00', 'fileaccessdate': '2024:01:03 22:49:24+00:00', 'fileinodechangedate': '2024:01:03 22:49:24+00:00', 'filepermissions': '-rw-------', 'filetype': 'PDF', 'filetypeextension': 'pdf', 'mimetype': 'application/pdf', 'pdfversion': 1.7, 'linearized': 'Yes', 'encryption': 'Standard V5.6 (256-bit)', 'warning': '[minor] Decryption is very slow for encryption V5.6 or higher', 'useraccess': 'Print, Modify, Copy, Annotate, Fill forms, Extract, Print high-res' }, 'ocr': { 'elapsed': 0.026397, 'flags': ['uncaught_exception'], 'exception': 'Traceback (most recent call last):\\n\\n File \\"/usr/local/lib/python3.10/dist-packages/strelka-0.0.0-py3.10.egg/strelka/strelka.py\\", line 779, in scan_wrapper\\n self.scan(data, file, options, expire_at)\\n\\n File \\"/usr/local/lib/python3.10/dist-packages/strelka-0.0.0-py3.10.egg/strelka/scanners/scan_ocr.py\\", line 29, in scan\\n data = doc.get_page_pixmap(0).tobytes(\\"png\\")\\n\\n File \\"/usr/local/lib/python3.10/dist-packages/fitz/utils.py\\", line 922, in get_page_pixmap\\n return doc[pno].get_pixmap(\\n\\n File \\"/usr/local/lib/python3.10/dist-packages/fitz/fitz.py\\", line 5447, in __getitem__\\n raise IndexError(\\"page not in document\\")\\n\\nIndexError: page not in document\\n' }, 'pdf': { 'elapsed': 0.025871, 'flags': ['pdf_load_error'], 'images': 0, 'lines': 0, 'words': 0, 'xref_object': set() }, 'yara': { 'elapsed': 0.000709, 'rules_loaded': 1, 'matches': ['test'] } } }
'xref_object': set() looks suspicious to me.
'xref_object': set()
Expected behavior The event can be serialized
Release
Additional context Add any other context about the problem here.
Thanks for reporting this @andrea-matsec. I have a fix for this (to be honest, I thought I already implemented - must have been a dream).
I'll push it out, along with a new release, tomorrow.
Describe the bug Strelka fails to serialize events. I believe this is happening only when there's a
pdf_load_error
but I'm not 100% certain.Environment details
Steps to reproduce Unfortunately I can't share the file to reproduce this error, but this is the even that can't be serialized.
'xref_object': set()
looks suspicious to me.Expected behavior The event can be serialized
Release
Additional context Add any other context about the problem here.