rushter / selectolax

Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).
MIT License
1.11k stars 68 forks source link

Tags out of order in returned list when using css to specify multiple tags #104

Closed pushshift closed 8 months ago

pushshift commented 11 months ago

When using css selection, I want to grab two different tags (p and h3). When I use the selector like this:

html.css("p,h3")

It selects the appropriate tags but the list gives all p tags first and the h3 tag last.

Example:

<p>   1 </p>
<h3>  2 </h3>
<p>   3 </p>

I would expect the returned list to give: [<node p>, <node h3>, <node p>]

Instead it returns: [<node p>, <node p>, <node h3>]

However, if I use html.css("*") it does return them in correct order but I have to loop through and throw out all unneeded nodes.

If this is indeed a bug, I'd give it a low priority since using css("*") is an alternative where I can simply loop through and only grab what I'm interested in. I just wasn't sure if this was a bug or expected behavior.

If this is expected behavior when selecting multiple css elements, is there a way to get them in the order they appear in the parent (similar to "*" as the CSS selector)

Also, please provide Patreon or Bitcoin wallet if possible so I can contribute for your time. Thank you for creating such an amazing tool. I use this often since it is lightweight, efficient and easy to use.

rushter commented 11 months ago

Yeah, that's indeed unexpected behavior. I will have a look a bit closer this week.

@lexborisov is there a way to fix this? It looks like both modest and lexbor are affected.

Also, please provide Patreon or Bitcoin wallet if possible so I can contribute for your time. Thank you for creating such an amazing tool. I use this often since it is lightweight, efficient and easy to use.

That would be unfair to take all the credit for this library since most of the hard work is done by @lexborisov. @lexborisov do you accept donations?

lexborisov commented 11 months ago

Hi @rushter @pushshift

Yeah, that's indeed unexpected behavior. I will have a look a bit closer this week.

@lexborisov is there a way to fix this? It looks like both modest and lexbor are affected.

Yeah, it's my fault. I'll try to fix it by Monday.

Also, please provide Patreon or Bitcoin wallet if possible so I can contribute for your time. Thank you for creating such an amazing tool. I use this often since it is lightweight, efficient and easy to use.

That would be unfair to take all the credit for this library since most of the hard work is done by @lexborisov. @lexborisov do you accept donations?

I seriously hadn't considered accepting donations. It doesn't seem to make sense. Not that many people will be donating. @rushter you can safely accept donations. It's your binging and people like it. I don't see anything wrong with it.

lexborisov commented 10 months ago

@pushshift @rushter

Sorry, I remember this challenge. A lot of things to do at my day job. I hope to solve it soon.

lexborisov commented 8 months ago

@rushter @pushshift

Sorry, it took time and a complete rewrite of the algorithm. Fixed in https://github.com/lexbor/lexbor/commit/7ed557d53e2b4391a49fb3ea8966177adc652cf0

rushter commented 8 months ago

I've deployed a new release with updated lexbor backend.