google / gumbo-parser

An HTML5 parsing library in pure C99
Apache License 2.0
5.16k stars 660 forks source link

Don't crash when memory is exhausted #389

Open DemiMarie opened 7 years ago

DemiMarie commented 7 years ago

Previously, returning NULL from gumbo_parser_allocate() was usually unchecked. Instead, bail out with longjmp() and return NULL from gumbo_parse_with_options().

Fixes #388.

kevinhendricks commented 7 years ago

FWIW ... It would be nice if some maintainer would truly fork this project to create a maintenance version of gumbo where at least all outstanding bug reports with patches were included into a new master and also maybe an updated arena allocator based branch as well (although thread safety may be an issue in the arena version depending on the underlying malloc used for the arena slab allocator).

We at Sigil have a version of gumbo that has most if not all bug/security fixes in place but we add support for xhtml parsing in our version as well as adding code to allow editing the returned gumbo tree which most might find useful.

Any takers?

Sent from my iPad

On Oct 8, 2017, at 9:25 PM, Craig Barnes notifications@github.com wrote:

@craigbarnes commented on this pull request.

In src/parser.c:

parser._options = options; +

  • if (SETJMP(parser._oom_buf)) {
  • if (parser._parser_state)
  • parser_state_destroy(&parser); I was considering using the arena pull request to solve this problem for lua-gumbo. Unfortunately, the state of that pull request as it stands still contains dozens of unchecked calls to malloc and has a few other problems I can't recall off the top of my head.

Gumbo is essentially an abandoned project at this point though... It seems unlikely that any of the newer bug reports will be fixed in this repo.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

kevinhendricks commented 7 years ago

Correction:

"which most might find useful" -> "which most might NOT find useful"

On Oct 8, 2017, at 9:25 PM, Craig Barnes notifications@github.com wrote:

@craigbarnes commented on this pull request.

In src/parser.c:

parser._options = options; +

  • if (SETJMP(parser._oom_buf)) {
  • if (parser._parser_state)
  • parser_state_destroy(&parser); I was considering using the arena pull request to solve this problem for lua-gumbo. Unfortunately, the state of that pull request as it stands still contains dozens of unchecked calls to malloc and has a few other problems I can't recall off the top of my head.

Gumbo is essentially an abandoned project at this point though... It seems unlikely that any of the newer bug reports will be fixed in this repo.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

craigbarnes commented 7 years ago

@kevinhendricks

I'm maintaining a fork of libgumbo within the lua-gumbo repo. If you want to set up a GitHub org, I'd be glad to contribute some of the changes I've made back and help with maintenance.

I've noticed a few other forks around GitHub that may be interested in merging efforts too. There's also a Python html5-parser library that seems to be based on a fork of gumbo.

DemiMarie commented 7 years ago

Right now the only actively maintained HTML parser that is meant for standalone use is Servo's html5ever. It has a LOT going for it (for one, it is immune to most code execution exploits by virtue of being written in safe Rust; it is also faster). However, it isn't exactly easy to bind from anything other than Rust.

On Oct 8, 2017 9:25 PM, "Craig Barnes" notifications@github.com wrote:

@craigbarnes commented on this pull request.

In src/parser.c https://github.com/google/gumbo-parser/pull/389#discussion_r143374477:

parser._options = options; +

  • if (SETJMP(parser._oom_buf)) {
  • if (parser._parser_state)
  • parser_state_destroy(&parser);

I was considering using the arena https://github.com/google/gumbo-parser/pull/309 pull request to solve this problem for lua-gumbo. Unfortunately, the state of that pull request as it stands still contains dozens of unchecked calls to malloc and has a few other problems I can't recall off the top of my head.

Gumbo is essentially an abandoned project at this point though... It seems unlikely that any of the newer bug reports will be fixed in this repo.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/google/gumbo-parser/pull/389#discussion_r143374477, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGWB7qnoT_pJgdqv6RiVUdA2SRX441dks5sqXX7gaJpZM4Ogiay .

craigbarnes commented 7 years ago

Right now the only actively maintained HTML parser that is meant for standalone use is Servo's html5ever. It has a LOT going for it (for one, it is immune to most code execution exploits by virtue of being written in safe Rust; it is also faster). However, it isn't exactly easy to bind from anything other than Rust.

In the case of this particular bug, we could get the same behaviour as in Rust by just calling abort if malloc fails. I already do that in lua-gumbo via a custom allocator. It would be nicer (for certain use cases) to recover and return NULL from gumbo_parse though.