kovidgoyal / html5-parser

Fast C based HTML 5 parsing for python
Apache License 2.0
678 stars 33 forks source link

double free or corruption when parsing "<html><html />" in xhtml mode #17

Closed ivan closed 5 years ago

ivan commented 5 years ago

This happens on an amd64 Debian 9.5 machine with a pyenv-compiled Python 3.7.0:

Python 3.7.0 (default, Sep 28 2018, 07:40:17) 
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import html5_parser
>>> html5_parser.parse("<html><html />", maybe_xhtml=True)
*** Error in `/home/grab/gs-venv/bin/python': double free or corruption (fasttop): 0x000055e73b907380 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7efc5650fbfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7efc56515fc6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7780e)[0x7efc5651680e]
/home/grab/gs-venv/lib/python3.7/site-packages/html5_parser/html_parser.cpython-37m-x86_64-linux-gnu.so(+0x1117d)[0x7efc5428a17d]
/home/grab/gs-venv/lib/python3.7/site-packages/html5_parser/html_parser.cpython-37m-x86_64-linux-gnu.so(+0xeb89)[0x7efc54287b89]
/home/grab/gs-venv/lib/python3.7/site-packages/html5_parser/html_parser.cpython-37m-x86_64-linux-gnu.so(+0x16a21)[0x7efc5428fa21]
/home/grab/gs-venv/lib/python3.7/site-packages/html5_parser/html_parser.cpython-37m-x86_64-linux-gnu.so(+0x80a1)[0x7efc542810a1]
/home/grab/gs-venv/bin/python(_PyCFunction_FastCallKeywords+0x362)[0x55e739b05392]
/home/grab/gs-venv/bin/python(_PyEval_EvalFrameDefault+0x8079)[0x55e739af33d9]
/home/grab/gs-venv/bin/python(_PyEval_EvalCodeWithName+0xacd)[0x55e739bbeaad]
/home/grab/gs-venv/bin/python(_PyFunction_FastCallKeywords+0xa6)[0x55e739b04856]
/home/grab/gs-venv/bin/python(_PyEval_EvalFrameDefault+0x7b68)[0x55e739af2ec8]
/home/grab/gs-venv/bin/python(_PyEval_EvalCodeWithName+0xacd)[0x55e739bbeaad]
/home/grab/gs-venv/bin/python(PyEval_EvalCode+0x23)[0x55e739bbebe3]
/home/grab/gs-venv/bin/python(+0x166d4b)[0x55e739bf6d4b]
/home/grab/gs-venv/bin/python(PyRun_InteractiveLoopFlags+0x76)[0x55e739bf7016]
/home/grab/gs-venv/bin/python(PyRun_AnyFileExFlags+0x3e)[0x55e739bf717e]
/home/grab/gs-venv/bin/python(+0x67fa2)[0x55e739af7fa2]
/home/grab/gs-venv/bin/python(_Py_UnixMain+0x6a)[0x55e739af87da]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7efc564bf2e1]
/home/grab/gs-venv/bin/python(_start+0x2a)[0x55e739af3e4a]
======= Memory map: ========
55e739a90000-55e739d60000 r-xp 00000000 fe:00 74359575                   /home/grab/.pyenv/versions/3.7.0/bin/python3.7
55e739f5f000-55e739f62000 r--p 002cf000 fe:00 74359575                   /home/grab/.pyenv/versions/3.7.0/bin/python3.7
55e739f62000-55e739fcb000 rw-p 002d2000 fe:00 74359575                   /home/grab/.pyenv/versions/3.7.0/bin/python3.7
55e739fcb000-55e739fec000 rw-p 00000000 00:00 0 
55e73b7c1000-55e73b929000 rw-p 00000000 00:00 0                          [heap]
7efc4c000000-7efc4c021000 rw-p 00000000 00:00 0 
7efc4c021000-7efc50000000 ---p 00000000 00:00 0 
7efc51420000-7efc51460000 rw-p 00000000 00:00 0 
7efc51460000-7efc51476000 r-xp 00000000 fe:00 2151222669                 /lib/x86_64-linux-gnu/libgcc_s.so.1
7efc51476000-7efc51675000 ---p 00016000 fe:00 2151222669                 /lib/x86_64-linux-gnu/libgcc_s.so.1
7efc51675000-7efc51676000 r--p 00015000 fe:00 2151222669                 /lib/x86_64-linux-gnu/libgcc_s.so.1
7efc51676000-7efc51677000 rw-p 00016000 fe:00 2151222669                 /lib/x86_64-linux-gnu/libgcc_s.so.1
7efc51677000-7efc517e9000 r-xp 00000000 fe:00 2198801676                 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
7efc517e9000-7efc519e9000 ---p 00172000 fe:00 2198801676                 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
7efc519e9000-7efc519f3000 r--p 00172000 fe:00 2198801676                 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
7efc519f3000-7efc519f5000 rw-p 0017c000 fe:00 2198801676                 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22
7efc519f5000-7efc519f9000 rw-p 00000000 00:00 0 
7efc519f9000-7efc51a1e000 r-xp 00000000 fe:00 2149466568                 /lib/x86_64-linux-gnu/liblzma.so.5.2.2
7efc51a1e000-7efc51c1d000 ---p 00025000 fe:00 2149466568                 /lib/x86_64-linux-gnu/liblzma.so.5.2.2
7efc51c1d000-7efc51c1e000 r--p 00024000 fe:00 2149466568                 /lib/x86_64-linux-gnu/liblzma.so.5.2.2
7efc51c1e000-7efc51c1f000 rw-p 00025000 fe:00 2149466568                 /lib/x86_64-linux-gnu/liblzma.so.5.2.2
7efc51c1f000-7efc5349b000 r-xp 00000000 fe:00 2149468690                 /usr/lib/x86_64-linux-gnu/libicudata.so.57.1
7efc5349b000-7efc5369a000 ---p 0187c000 fe:00 2149468690                 /usr/lib/x86_64-linux-gnu/libicudata.so.57.1
7efc5369a000-7efc5369b000 r--p 0187b000 fe:00 2149468690                 /usr/lib/x86_64-linux-gnu/libicudata.so.57.1
7efc5369b000-7efc5369c000 rw-p 0187c000 fe:00 2149468690                 /usr/lib/x86_64-linux-gnu/libicudata.so.57.1
7efc5369c000-7efc53830000 r-xp 00000000 fe:00 2149468714                 /usr/lib/x86_64-linux-gnu/libicuuc.so.57.1
7efc53830000-7efc53a2f000 ---p 00194000 fe:00 2149468714                 /usr/lib/x86_64-linux-gnu/libicuuc.so.57.1
7efc53a2f000-7efc53a41000 r--p 00193000 fe:00 2149468714                 /usr/lib/x86_64-linux-gnu/libicuuc.so.57.1
7efc53a41000-7efc53a42000 rw-p 001a5000 fe:00 2149468714                 /usr/lib/x86_64-linux-gnu/libicuuc.so.57.1
7efc53a42000-7efc53a44000 rw-p 00000000 00:00 0 
7efc53a44000-7efc53caf000 r-xp 00000000 fe:00 2149468708                 /usr/lib/x86_64-linux-gnu/libicui18n.so.57.1
7efc53caf000-7efc53eae000 ---p 0026b000 fe:00 2149468708                 /usr/lib/x86_64-linux-gnu/libicui18n.so.57.1
7efc53eae000-7efc53ebb000 r--p 0026a000 fe:00 2149468708                 /usr/lib/x86_64-linux-gnu/libicui18n.so.57.1
7efc53ebb000-7efc53ebd000 rw-p 00277000 fe:00 2149468708                 /usr/lib/x86_64-linux-gnu/libicui18n.so.57.1
7efc53ebd000-7efc53ebe000 rw-p 00000000 00:00 0 
7efc53ebe000-7efc5406e000 r-xp 00000000 fe:00 2150907904                 /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.4
7efc5406e000-7efc5426e000 ---p 001b0000 fe:00 2150907904                 /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.4
7efc5426e000-7efc54276000 r--p 001b0000 fe:00 2150907904                 /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.4
7efc54276000-7efc54278000 rw-p 001b8000 fe:00 2150907904                 /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.4
7efc54278000-7efc54279000 rw-p 00000000 00:00 0 
7efc54279000-7efc542de000 r-xp 00000000 fe:00 2152153831                 /home/grab/gs-venv/lib/python3.7/site-packages/html5_parser/html_parser.cpython-37m-x86_64-linux-gnu.so
7efc542de000-7efc544dd000 ---p 00065000 fe:00 2152153831                 /home/grab/gs-venv/lib/python3.7/site-packages/html5_parser/html_parser.cpython-37m-x86_64-linux-gnu.so
7efc544dd000-7efc544df000 r--p 00064000 fe:00 2152153831                 /home/grab/gs-venv/lib/python3.7/site-packages/html5_parser/html_parser.cpython-37m-x86_64-linux-gnu.so
7efc544df000-7efc544e1000 rw-p 00066000 fe:00 2152153831                 /home/grab/gs-venv/lib/python3.7/site-packages/html5_parser/html_parser.cpython-37m-x86_64-linux-gnu.so
7efc544e1000-7efc544fa000 r-xp 00000000 fe:00 2149466559                 /lib/x86_64-linux-gnu/libz.so.1.2.8
7efc544fa000-7efc546f9000 ---p 00019000 fe:00 2149466559                 /lib/x86_64-linux-gnu/libz.so.1.2.8
7efc546f9000-7efc546fa000 r--p 00018000 fe:00 2149466559                 /lib/x86_64-linux-gnu/libz.so.1.2.8
7efc546fa000-7efc546fb000 rw-p 00019000 fe:00 2149466559                 /lib/x86_64-linux-gnu/libz.so.1.2.8
7efc546fb000-7efc54701000 r-xp 00000000 fe:00 2200123577                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/zlib.cpython-37m-x86_64-linux-gnu.so
7efc54701000-7efc54900000 ---p 00006000 fe:00 2200123577                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/zlib.cpython-37m-x86_64-linux-gnu.so
7efc54900000-7efc54901000 r--p 00005000 fe:00 2200123577                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/zlib.cpython-37m-x86_64-linux-gnu.so
7efc54901000-7efc54903000 rw-p 00006000 fe:00 2200123577                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/zlib.cpython-37m-x86_64-linux-gnu.so
7efc54903000-7efc5490c000 r-xp 00000000 fe:00 2200435341                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_struct.cpython-37m-x86_64-linux-gnu.so
7efc5490c000-7efc54b0b000 ---p 00009000 fe:00 2200435341                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_struct.cpython-37m-x86_64-linux-gnu.so
7efc54b0b000-7efc54b0c000 r--p 00008000 fe:00 2200435341                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_struct.cpython-37m-x86_64-linux-gnu.so
7efc54b0c000-7efc54b0e000 rw-p 00009000 fe:00 2200435341                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_struct.cpython-37m-x86_64-linux-gnu.so
7efc54b0e000-7efc54b4f000 rw-p 00000000 00:00 0 
7efc54b4f000-7efc54b50000 r-xp 00000000 fe:00 2200024832                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_opcode.cpython-37m-x86_64-linux-gnu.so
7efc54b50000-7efc54d4f000 ---p 00001000 fe:00 2200024832                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_opcode.cpython-37m-x86_64-linux-gnu.so
7efc54d4f000-7efc54d50000 r--p 00000000 fe:00 2200024832                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_opcode.cpython-37m-x86_64-linux-gnu.so
7efc54d50000-7efc54d51000 rw-p 00001000 fe:00 2200024832                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_opcode.cpython-37m-x86_64-linux-gnu.so
7efc54d51000-7efc54d91000 rw-p 00000000 00:00 0 
7efc54d91000-7efc54dc5000 r-xp 00000000 fe:00 2150409642                 /home/grab/gs-venv/lib/python3.7/site-packages/lxml/_elementpath.cpython-37m-x86_64-linux-gnu.so
7efc54dc5000-7efc54fc4000 ---p 00034000 fe:00 2150409642                 /home/grab/gs-venv/lib/python3.7/site-packages/lxml/_elementpath.cpython-37m-x86_64-linux-gnu.so
7efc54fc4000-7efc54fc9000 rw-p 00033000 fe:00 2150409642                 /home/grab/gs-venv/lib/python3.7/site-packages/lxml/_elementpath.cpython-37m-x86_64-linux-gnu.so
7efc54fc9000-7efc55053000 rw-p 00000000 00:00 0 
7efc55053000-7efc5505a000 r-xp 00000000 fe:00 2222546969                 /lib/x86_64-linux-gnu/librt-2.24.so
7efc5505a000-7efc55259000 ---p 00007000 fe:00 2222546969                 /lib/x86_64-linux-gnu/librt-2.24.so
7efc55259000-7efc5525a000 r--p 00006000 fe:00 2222546969                 /lib/x86_64-linux-gnu/librt-2.24.so
7efc5525a000-7efc5525b000 rw-p 00007000 fe:00 2222546969                 /lib/x86_64-linux-gnu/librt-2.24.so
7efc5525b000-7efc557a9000 r-xp 00000000 fe:00 2150409655                 /home/grab/gs-venv/lib/python3.7/site-packages/lxml/etree.cpython-37m-x86_64-linux-gnu.so
7efc557a9000-7efc559a9000 ---p 0054e000 fe:00 2150409655                 /home/grab/gs-venv/lib/python3.7/site-packages/lxml/etree.cpython-37m-x86_64-linux-gnu.so
7efc559a9000-7efc559eb000 rw-p 0054e000 fe:00 2150409655                 /home/grab/gs-venv/lib/python3.7/site-packages/lxml/etree.cpython-37m-x86_64-linux-gnu.so
7efc559eb000-7efc55a77000 rw-p 00000000 00:00 0 
7efc55a77000-7efc55a79000 r-xp 00000000 fe:00 2200123576                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_heapq.cpython-37m-x86_64-linux-gnu.so
7efc55a79000-7efc55c78000 ---p 00002000 fe:00 2200123576                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_heapq.cpython-37m-x86_64-linux-gnu.so
7efc55c78000-7efc55c79000 r--p 00001000 fe:00 2200123576                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_heapq.cpython-37m-x86_64-linux-gnu.so
7efc55c79000-7efc55c7b000 rw-p 00002000 fe:00 2200123576                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/_heapq.cpython-37m-x86_64-linux-gnu.so
7efc55c7b000-7efc55cbb000 rw-p 00000000 00:00 0 
7efc55cbb000-7efc55ce1000 r-xp 00000000 fe:00 2149521574                 /lib/x86_64-linux-gnu/libtinfo.so.5.9
7efc55ce1000-7efc55ee0000 ---p 00026000 fe:00 2149521574                 /lib/x86_64-linux-gnu/libtinfo.so.5.9
7efc55ee0000-7efc55ee4000 r--p 00025000 fe:00 2149521574                 /lib/x86_64-linux-gnu/libtinfo.so.5.9
7efc55ee4000-7efc55ee5000 rw-p 00029000 fe:00 2149521574                 /lib/x86_64-linux-gnu/libtinfo.so.5.9
7efc55ee5000-7efc55f29000 r-xp 00000000 fe:00 2149734202                 /lib/x86_64-linux-gnu/libreadline.so.7.0
7efc55f29000-7efc56128000 ---p 00044000 fe:00 2149734202                 /lib/x86_64-linux-gnu/libreadline.so.7.0
7efc56128000-7efc5612a000 r--p 00043000 fe:00 2149734202                 /lib/x86_64-linux-gnu/libreadline.so.7.0
7efc5612a000-7efc56130000 rw-p 00045000 fe:00 2149734202                 /lib/x86_64-linux-gnu/libreadline.so.7.0
7efc56130000-7efc56132000 rw-p 00000000 00:00 0
7efc56132000-7efc56138000 r-xp 00000000 fe:00 2200123575                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/readline.cpython-37m-x86_64-linux-gnu.so
7efc56138000-7efc56337000 ---p 00006000 fe:00 2200123575                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/readline.cpython-37m-x86_64-linux-gnu.so
7efc56337000-7efc56338000 r--p 00005000 fe:00 2200123575                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/readline.cpython-37m-x86_64-linux-gnu.so
7efc56338000-7efc5633a000 rw-p 00006000 fe:00 2200123575                 /home/grab/.pyenv/versions/3.7.0/lib/python3.7/lib-dynload/readline.cpython-37m-x86_64-linux-gnu.so
7efc5633a000-7efc5637a000 rw-p 00000000 00:00 0
7efc5639f000-7efc5649f000 rw-p 00000000 00:00 0
7efc5649f000-7efc56634000 r-xp 00000000 fe:00 2149544626                 /lib/x86_64-linux-gnu/libc-2.24.so
7efc56634000-7efc56834000 ---p 00195000 fe:00 2149544626                 /lib/x86_64-linux-gnu/libc-2.24.so
7efc56834000-7efc56838000 r--p 00195000 fe:00 2149544626                 /lib/x86_64-linux-gnu/libc-2.24.so
7efc56838000-7efc5683a000 rw-p 00199000 fe:00 2149544626                 /lib/x86_64-linux-gnu/libc-2.24.so
7efc5683a000-7efc5683e000 rw-p 00000000 00:00 0
7efc5683e000-7efc56941000 r-xp 00000000 fe:00 2149544630                 /lib/x86_64-linux-gnu/libm-2.24.so
7efc56941000-7efc56b40000 ---p 00103000 fe:00 2149544630                 /lib/x86_64-linux-gnu/libm-2.24.so
7efc56b40000-7efc56b41000 r--p 00102000 fe:00 2149544630                 /lib/x86_64-linux-gnu/libm-2.24.so
7efc56b41000-7efc56b42000 rw-p 00103000 fe:00 2149544630                 /lib/x86_64-linux-gnu/libm-2.24.so
7efc56b42000-7efc56b44000 r-xp 00000000 fe:00 2222546971                 /lib/x86_64-linux-gnu/libutil-2.24.so
7efc56b44000-7efc56d43000 ---p 00002000 fe:00 2222546971                 /lib/x86_64-linux-gnu/libutil-2.24.so
7efc56d43000-7efc56d44000 r--p 00001000 fe:00 2222546971                 /lib/x86_64-linux-gnu/libutil-2.24.so
7efc56d44000-7efc56d45000 rw-p 00002000 fe:00 2222546971                 /lib/x86_64-linux-gnu/libutil-2.24.so
7efc56d45000-7efc56d48000 r-xp 00000000 fe:00 2149544629                 /lib/x86_64-linux-gnu/libdl-2.24.so
7efc56d48000-7efc56f47000 ---p 00003000 fe:00 2149544629                 /lib/x86_64-linux-gnu/libdl-2.24.so
7efc56f47000-7efc56f48000 r--p 00002000 fe:00 2149544629                 /lib/x86_64-linux-gnu/libdl-2.24.so
7efc56f48000-7efc56f49000 rw-p 00003000 fe:00 2149544629                 /lib/x86_64-linux-gnu/libdl-2.24.so
7efc56f49000-7efc56f61000 r-xp 00000000 fe:00 2222546958                 /lib/x86_64-linux-gnu/libpthread-2.24.so
7efc56f61000-7efc57160000 ---p 00018000 fe:00 2222546958                 /lib/x86_64-linux-gnu/libpthread-2.24.so
7efc57160000-7efc57161000 r--p 00017000 fe:00 2222546958                 /lib/x86_64-linux-gnu/libpthread-2.24.so
7efc57161000-7efc57162000 rw-p 00018000 fe:00 2222546958                 /lib/x86_64-linux-gnu/libpthread-2.24.so
7efc57162000-7efc57166000 rw-p 00000000 00:00 0
7efc57166000-7efc57189000 r-xp 00000000 fe:00 2149544622                 /lib/x86_64-linux-gnu/ld-2.24.so
7efc5719f000-7efc571df000 rw-p 00000000 00:00 0
7efc571df000-7efc5737a000 r--p 00000000 fe:00 334761                     /usr/lib/locale/locale-archive
7efc5737a000-7efc5737c000 rw-p 00000000 00:00 0
7efc5737e000-7efc5737f000 rw-p 00000000 00:00 0
7efc5737f000-7efc57386000 r--s 00000000 fe:00 2151137560                 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache
7efc57386000-7efc57389000 rw-p 00000000 00:00 0
7efc57389000-7efc5738a000 r--p 00023000 fe:00 2149544622                 /lib/x86_64-linux-gnu/ld-2.24.so
7efc5738a000-7efc5738b000 rw-p 00024000 fe:00 2149544622                 /lib/x86_64-linux-gnu/ld-2.24.so
7efc5738b000-7efc5738c000 rw-p 00000000 00:00 0
7ffc1ba9c000-7ffc1babd000 rw-p 00000000 00:00 0                          [stack]
7ffc1bbc6000-7ffc1bbc9000 r--p 00000000 00:00 0                          [vvar]
7ffc1bbc9000-7ffc1bbcb000 r-xp 00000000 00:00 0                          [vdso]
zsh: abort (core dumped)  ~/gs-venv/bin/python

I originally observed this crash on https://gist.github.com/ivan/6aea349161c00ee0b213f9217f0b7cf9, but it minimizes to just that <html><html /> shown above.