kevinboone / epub2txt2

A simple command-line utility for Linux, for extracting text from EPUB documents.
GNU General Public License v3.0
183 stars 31 forks source link

Crash Report #28

Open Cvjark opened 3 weeks ago

Cvjark commented 3 weeks ago

Hi, i found some Crash in this repo versionn:v2.08 github commit code: 67b1308fbd3a93f688ab51324253d4b3b8def52a

command to reproduce the crash: $epub2txt -a [crash sample file] ps: unzip to get the sample file.

sample file1: id_000000,sig_11,src_000013,time_2230,execs_103,op_inf,pos_0.zip

  error:  invalid compressed data to inflate /tmp/epub2txt3532/META-INF/container.xml
AddressSanitizer:DEADLYSIGNAL
=================================================================
==3532==ERROR: AddressSanitizer: SEGV on unknown address 0x00017fff7fff (pc 0x55ee34062b7f bp 0x7ffdbf1316d0 sp 0x7ffdbf1315e0 T0)
==3532==The signal is caused by a READ memory access.
    #0 0x55ee34062b7f  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x113b7f) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #1 0x55ee34063b57  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x114b57) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #2 0x55ee34069f40  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x11af40) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #3 0x7f282a9e7c89 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #4 0x7f282a9e7d44 in __libc_start_main csu/../csu/libc-start.c:360:3
    #5 0x55ee33f81490  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x32490) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x113b7f) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff) 
==3532==ABORTING

sample file02: id_000002,sig_11,src_000013,time_44690,execs_1497,op_quick,pos_784.zip

$ ../../../epub2txt -a ./id:000002,sig:11,src:000013,time:44690,execs:1497,op:quick,pos:784 
/tmp/epub2txt3809/META-INF/container.xml  bad CRC 93599cfe  (should be 6b3031c5)
AddressSanitizer:DEADLYSIGNAL
=================================================================
==3809==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55f6210ee630 bp 0x7ffe220f91a0 sp 0x7ffe220f8940 T0)
==3809==The signal is caused by a READ memory access.
==3809==Hint: address points to the zero page.
    #0 0x55f6210ee630 in strcmp (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x4a630) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #1 0x55f6211d1bdc  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x12dbdc) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #2 0x55f6211d55dd  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x1315dd) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #3 0x55f6211d817f  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x13417f) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #4 0x55f6211b7b1e  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x113b1e) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #5 0x55f6211b8b57  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x114b57) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #6 0x55f6211bef40  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x11af40) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #7 0x7fbb97a78c89 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #8 0x7fbb97a78d44 in __libc_start_main csu/../csu/libc-start.c:360:3
    #9 0x55f6210d6490  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x32490) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x4a630) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff) in strcmp
==3809==ABORTING

sample file 3: id_000009,sig_11,sync_s1,src_000010.zip

$ ../../../epub2txt -a ./id:000009,sig:11,sync:s1,src:000010                               
/tmp/epub2txt3904/OPS/epb.opf  bad CRC ec287068  (should be 0192f2f4)
AddressSanitizer:DEADLYSIGNAL
=================================================================
==3904==ERROR: AddressSanitizer: SEGV on unknown address 0x502800000388 (pc 0x563c8808c5c1 bp 0x7ffd9d57a730 sp 0x7ffd9d57a5e0 T0)
==3904==The signal is caused by a READ memory access.
    #0 0x563c8808c5c1  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x1125c1) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #1 0x563c88090707  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x116707) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #2 0x563c88094f40  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x11af40) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #3 0x7f5bc7541c89 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #4 0x7f5bc7541d44 in __libc_start_main csu/../csu/libc-start.c:360:3
    #5 0x563c87fac490  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x32490) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x1125c1) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff) 
==3904==ABORTING

sample file 4: id_000013,sig_11,src_000200,time_3909190,execs_108727,op_quick,pos1796,val+1.zip

$ ../../../epub2txt -a ./id:000013,sig:11,src:000200,time:3909190,execs:108727,op:quick,pos:1796,val:+1 
/tmp/epub2txt4068/OPS/epb.opf  bad CRC 96ba5524  (should be 0192f2f4)
epub2txt:2: An error was found (TEXT_OUTSIDE_NODE(-5)), loading aborted...
AddressSanitizer:DEADLYSIGNAL
=================================================================
==4068==ERROR: AddressSanitizer: SEGV on unknown address 0x00017fff7fff (pc 0x5569fdb475b4 bp 0x7ffeddb21510 sp 0x7ffeddb213c0 T0)
==4068==The signal is caused by a READ memory access.
    #0 0x5569fdb475b4  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x1125b4) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #1 0x5569fdb4b707  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x116707) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #2 0x5569fdb4ff40  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x11af40) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #3 0x7f7634e5ec89 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #4 0x7f7634e5ed44 in __libc_start_main csu/../csu/libc-start.c:360:3
    #5 0x5569fda67490  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x32490) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x1125b4) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff) 
==4068==ABORTING

sample file 5: id_000009,sig_06,sync_main,src_000009.zip

$ ../../../epub2txt -a ./id:000009,sig:06,sync:main,src:000009 
/tmp/epub2txt4980/META-INF/container.xml  bad CRC 210a7fc7  (should be 6b3031c5)
/tmp/epub2txt4980/OPS/epb.opf  bad CRC cb4e4f7b  (should be 0192f2f4)
=================================================================
==4980==ERROR: AddressSanitizer: negative-size-param: (size=-1)
    #0 0x55b19ffcbfd3 in strncpy (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0xb8fd3) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #1 0x55b1a003ea08  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x12ba08) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #2 0x55b1a00441bb  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x1311bb) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #3 0x55b1a004717f  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x13417f) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #4 0x55b1a0026b1e  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x113b1e) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #5 0x55b1a0027b57  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x114b57) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #6 0x55b1a002df40  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x11af40) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #7 0x7efe655a3c89 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #8 0x7efe655a3d44 in __libc_start_main csu/../csu/libc-start.c:360:3
    #9 0x55b19ff45490  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x32490) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)

0x5020000000b0 is located 0 bytes inside of 1-byte region [0x5020000000b0,0x5020000000b1)
allocated by thread T0 here:
    #0 0x55b19ffe3992 in malloc (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0xd0992) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #1 0x55b1a003e9c5  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x12b9c5) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #2 0x55b1a004717f  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x13417f) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #3 0x55b1a0026b1e  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x113b1e) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #4 0x55b1a0027b57  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x114b57) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #5 0x55b1a002df40  (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0x11af40) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff)
    #6 0x7efe655a3c89 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

SUMMARY: AddressSanitizer: negative-size-param (/home/pitta/workspace/fuzz_task/epub2txt2/epub2txt+0xb8fd3) (BuildId: 1e67861830c23e4caec542341fbd9527832f82ff) in strncpy
==4980==ABORTING
kevinboone commented 3 weeks ago

Thank you for reporting this.

Three of the failure cases you supplied, I could fix with only small changes to the code, so I have done so.

Two of the cases could not be easily fixed, without substantial changes to the XML parsing logic, which would add more than I want to, to the size of the program. I'm sorry, but I will not be fixing these.

The failures in this bug report all relate to the handling of corrupt EPUB files. That is, they contain missing XML files, or XML files that are badly-formed. I do not claim that epub2txt2 is robust against abuse of that kind. Its design priorities are small size, speed, and minimal dependencies. Even when a bug of this type can be fixed, the fix just changes a crash into a fatal error message. Fixing such bugs does not really improve the user experience much, if at all.

I appreciate bug reports, but I do want to remind people that epub2txt2 is not designed to operate in a hostile environment, where it will be supplied with deliberately broken input data. It should not be used in this way, and I can't promise to fix bugs if it is.