Open DemiMarie opened 7 years ago
FWIW ... It would be nice if some maintainer would truly fork this project to create a maintenance version of gumbo where at least all outstanding bug reports with patches were included into a new master and also maybe an updated arena allocator based branch as well (although thread safety may be an issue in the arena version depending on the underlying malloc used for the arena slab allocator).
We at Sigil have a version of gumbo that has most if not all bug/security fixes in place but we add support for xhtml parsing in our version as well as adding code to allow editing the returned gumbo tree which most might find useful.
Any takers?
Sent from my iPad
On Oct 8, 2017, at 9:25 PM, Craig Barnes notifications@github.com wrote:
@craigbarnes commented on this pull request.
In src/parser.c:
parser._options = options; +
- if (SETJMP(parser._oom_buf)) {
- if (parser._parser_state)
- parser_state_destroy(&parser); I was considering using the arena pull request to solve this problem for lua-gumbo. Unfortunately, the state of that pull request as it stands still contains dozens of unchecked calls to malloc and has a few other problems I can't recall off the top of my head.
Gumbo is essentially an abandoned project at this point though... It seems unlikely that any of the newer bug reports will be fixed in this repo.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Correction:
"which most might find useful" -> "which most might NOT find useful"
On Oct 8, 2017, at 9:25 PM, Craig Barnes notifications@github.com wrote:
@craigbarnes commented on this pull request.
In src/parser.c:
parser._options = options; +
- if (SETJMP(parser._oom_buf)) {
- if (parser._parser_state)
- parser_state_destroy(&parser); I was considering using the arena pull request to solve this problem for lua-gumbo. Unfortunately, the state of that pull request as it stands still contains dozens of unchecked calls to malloc and has a few other problems I can't recall off the top of my head.
Gumbo is essentially an abandoned project at this point though... It seems unlikely that any of the newer bug reports will be fixed in this repo.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
@kevinhendricks
I'm maintaining a fork of libgumbo within the lua-gumbo repo. If you want to set up a GitHub org, I'd be glad to contribute some of the changes I've made back and help with maintenance.
I've noticed a few other forks around GitHub that may be interested in merging efforts too. There's also a Python html5-parser library that seems to be based on a fork of gumbo.
Right now the only actively maintained HTML parser that is meant for standalone use is Servo's html5ever. It has a LOT going for it (for one, it is immune to most code execution exploits by virtue of being written in safe Rust; it is also faster). However, it isn't exactly easy to bind from anything other than Rust.
On Oct 8, 2017 9:25 PM, "Craig Barnes" notifications@github.com wrote:
In src/parser.c https://github.com/google/gumbo-parser/pull/389#discussion_r143374477:
parser._options = options; +
- if (SETJMP(parser._oom_buf)) {
- if (parser._parser_state)
- parser_state_destroy(&parser);
I was considering using the arena https://github.com/google/gumbo-parser/pull/309 pull request to solve this problem for lua-gumbo. Unfortunately, the state of that pull request as it stands still contains dozens of unchecked calls to malloc and has a few other problems I can't recall off the top of my head.
Gumbo is essentially an abandoned project at this point though... It seems unlikely that any of the newer bug reports will be fixed in this repo.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/google/gumbo-parser/pull/389#discussion_r143374477, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGWB7qnoT_pJgdqv6RiVUdA2SRX441dks5sqXX7gaJpZM4Ogiay .
Right now the only actively maintained HTML parser that is meant for standalone use is Servo's html5ever. It has a LOT going for it (for one, it is immune to most code execution exploits by virtue of being written in safe Rust; it is also faster). However, it isn't exactly easy to bind from anything other than Rust.
In the case of this particular bug, we could get the same behaviour as in Rust by just calling abort
if malloc
fails. I already do that in lua-gumbo via a custom allocator. It would be nicer (for certain use cases) to recover and return NULL
from gumbo_parse
though.
Previously, returning NULL from
gumbo_parser_allocate()
was usually unchecked. Instead, bail out withlongjmp()
and return NULL fromgumbo_parse_with_options()
.Fixes #388.