extract_words() slower when fewer extra_attrs are passed

Discussed in https://github.com/jsvine/pdfplumber/discussions/483

^{Originally posted by **hadikoub** July 28, 2021} The idea is that I'm trying to find Bold and Blank sections in a PDF file so I was experimenting with `extract_words()` function to be able to group sections based on the font family. I found a way to extract Bold text by grouping sections by font name and size and then finding Bold font family ``` sections = page.extract_words(keep_blank_chars=True, extra_attrs=["fontname", "size"]) ``` and as a similar approach, I did the same for grouping sections to find blanks in between them ``` sections = page.extract_words(keep_blank_chars=True, extra_attrs=[ "size"]) ``` But the issue I faced is a big gap in performance between the 2 methods: - **using extra_attrs=["fontname", "size"]** `sections = page.extract_words(keep_blank_chars=True, extra_attrs=["fontname", "size"])` **line execution time avg: 0.5 Sec** - **using extra_attrs=[ "size"]** `sections = page.extract_words(keep_blank_chars=True, extra_attrs=[ "size"])` **line execution time avg: 5.2 Sec** Knowing that both statement are using the same page. Also, I noticed when adding more attributes that render the response larger like attr="adv" it reduces the execution speed furthermore at 22.7ms per page why does a statement of `extract_words()` with more filters outperformed the second statement having fewer filters? and is there any way to improve the speed of the second statement?

jsvine / pdfplumber

extract_words() slower when fewer extra_attrs are passed #484

Discussed in https://github.com/jsvine/pdfplumber/discussions/483