Originally posted by **hadikoub** July 28, 2021
The idea is that I'm trying to find Bold and Blank sections in a PDF file so I was experimenting with `extract_words()` function to be able to group sections based on the font family.
I found a way to extract Bold text by grouping sections by font name and size and then finding Bold font family
```
sections = page.extract_words(keep_blank_chars=True, extra_attrs=["fontname", "size"])
```
and as a similar approach, I did the same for grouping sections to find blanks in between them
```
sections = page.extract_words(keep_blank_chars=True, extra_attrs=[ "size"])
```
But the issue I faced is a big gap in performance between the 2 methods:
- **using extra_attrs=["fontname", "size"]**
`sections = page.extract_words(keep_blank_chars=True, extra_attrs=["fontname", "size"])`
**line execution time avg: 0.5 Sec**
- **using extra_attrs=[ "size"]**
`sections = page.extract_words(keep_blank_chars=True, extra_attrs=[ "size"])`
**line execution time avg: 5.2 Sec**
Knowing that both statement are using the same page.
Also, I noticed when adding more attributes that render the response larger like attr="adv" it reduces the execution speed furthermore at 22.7ms per page
why does a statement of `extract_words()` with more filters outperformed the second statement having fewer filters? and is there any way to improve the speed of the second statement?
Discussed in https://github.com/jsvine/pdfplumber/discussions/483