rushter / selectolax

Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).
MIT License
1.11k stars 68 forks source link

Performance optimization css_first #113

Open SkySandy opened 5 months ago

SkySandy commented 5 months ago

I received statistics on website downloads when using selectolax The most popular functions were as follows (by total time and number of calls): 1) css_first (3,2 million calls) 2) text (2,8 million calls) The library tests (at this site) also use "css_first" all the time

I looked at the source text css_first. The function first gets all the nodes on the page and then takes the first node. It takes a very long time. Can you optimize this process?

and another question

This function is called many times within my program:

def first_text(query, deep, default=None) link = item.css_first(query) if link is not None: return link.text(deep=deep, strip=True) else: return default

Could you add this function to the class Node (wth self instead of query)? (in all my parsing programs the parameter "strip" has a value True and I don't understand why you set the default value to False)