mgdm / htmlq

Like jq, but for HTML.
MIT License
7.09k stars 111 forks source link

Panic with :has selector #65

Open adtac opened 1 year ago

adtac commented 1 year ago

Minimal repro:

$ htmlq --version
htmlq 0.4.0

$ echo "<div><h1>h1</h1></div><div><p>p</p></div>" | htmlq 'div'
<div><h1>h1</h1></div>
<div><p>p</p></div>

$ echo "<div><h1>h1</h1></div><div><p>p</p></div>" | RUST_BACKTRACE=full htmlq 'div:has(h1)'
thread 'main' panicked at 'Failed to parse CSS selector: ()', src/main.rs:248:10
stack backtrace:
   0:        0x10256fc98 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h26781f9bfbe134a6
   1:        0x1025870c0 - core::fmt::write::h5aa0c05a9caf85df
   2:        0x10255bf30 - std::io::Write::write_fmt::h018da2c8c873b16b
   3:        0x1025646e8 - std::panicking::default_hook::{{closure}}::h0239cbf25cfa6d83
   4:        0x10256439c - std::panicking::default_hook::ha0ef000a358742e2
   5:        0x102564bf8 - std::panicking::rust_panic_with_hook::had5e29c8530a78b2
   6:        0x10256ffa0 - std::panicking::begin_panic_handler::{{closure}}::h7852e5e8d2675e0d
   7:        0x10256fdac - std::sys_common::backtrace::__rust_end_short_backtrace::hf05a8afc922cafbe
   8:        0x1025647fc - _rust_begin_unwind
   9:        0x102590aec - core::panicking::panic_fmt::h94866529fc2e06b8
  10:        0x102590bdc - core::result::unwrap_failed::h41371c24818ab0e0
  11:        0x1024e50e8 - htmlq::main::h04e4887fa86dbd1a
  12:        0x1024e9984 - std::sys_common::backtrace::__rust_begin_short_backtrace::hd36bd2ff6dd7522d
  13:        0x1024bb480 - std::rt::lang_start::{{closure}}::h9583fbf5a663487c
  14:        0x10255b8cc - std::rt::lang_start_internal::he17012042c5b2b42
  15:        0x1024e56e8 - _main

Expected output: <div><h1>h1</h1></div>

This was on macOS (ARM), but I could also reproduce it in Linux (x86). I don't have any other OS/arch to test on.

AlJohri commented 1 year ago

This would be really useful to add.

I believe this project uses kuchiki under the hood which in turn uses the servo selectors crate.

This is the upstream issue to add support for :has in servo selectors: https://github.com/servo/servo/issues/25133

AlJohri commented 1 year ago

FYI for anyone else looking for an alternative solution that supports :has.

This feature is also currently not supported by pup: https://github.com/ericchiang/pup/issues/194

It is however supported by python's soupsieve (dependency of beautifulsoup4) as one can see here:

https://github.com/facelessuser/soupsieve/blob/51ec317ada7e34f70fad6bfddaef8a2cfac1aebd/soupsieve/css_parser.py#L67