Closed RitaMarques closed 8 months ago
Hi @RitaMarques, and thanks for flagging this, which was indeed a bug.
Although there had been a test for Page.extract_words(extra_args=[...])
, there wasn't yet one for Page.extract_text(extra_args=[...])
, and the addition of a caching layer caused this error to be thrown, since list kwargs can't be hashed for the cache.
This is now solved in 0bfffc2 by pre-processing the kwargs to convert lists into tuples.
For now (i.e., before the next release), you can solve your problem by defining the extra_attrs
as a tuple instead of a list:
page.extract_text(
layout=True,
use_text_flow=True,
extra_attrs=("size", "fontname")
)
Let us know if that doesn't work for you.
Hi @jsvine, thanks for getting back to me! It's solved ;)
Describe the bug
While reading a simple PDF using the method
extract_text
, passing the list["size", "fontname"]
toextra_attrs
, it raises the error:Code to reproduce the problem
PDF file
Condioes_Gerais_Abertura_Conta.pdf
Screenshots
Environment