mgdm / htmlq

Like jq, but for HTML.
MIT License
7k stars 107 forks source link

Error when a selector ID includes a space #51

Open bereddy opened 1 year ago

bereddy commented 1 year ago

The stock quote in the HTML for this page,

https://www.google.com/finance/quote/TCAP:BKK

is within the following div :

<div class="YMlKec fxKbKc">
฿38.50
</div>

If I try to extract the content of this div using htmlq and a command like:

curl -X 'GET' 'https://www.google.com/finance/quote/TCAP:BKK?hl=en'| htmlq div[class="YMlKec fxKbKc"] --text

I get the following error:

thread 'main' panicked at 'Failed to parse CSS selector: ()', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/htmlq-0.4.0/src/main.rs:248:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

After trying to extract the stock quote with htmlq using approaches that use the same selector, but in different ways, it's pretty clear to me that htmlq doesn't work right when the selector ID has a space in it.

Or am I missing something?

ltgustavsen commented 3 months ago

You need to escape the space in the class. "YMlKec\ fxKbKc" Like this: curl -s -X 'GET' 'https://www.google.com/finance/quote/TCAP:BKK?hl=en'| htmlq div[class="YMlKec\ fxKbKc"] --text ฿50.00

XLTechie commented 3 months ago

Isn't that actually two selectors? I mean, last time I read the standard, 'class="foo bar"', was selecting both classes foo and bar, not a class with a space in its name.

So matching on both "class=foo" and "class=bar".