Open j-t-1 opened 1 week ago
I'm not sure this is at will. you may found a way to improve performances...
This has been added/changed in #124 by @mozbugbox.
This has been added/changed in #124 by @mozbugbox.
The first of the commit messages is "Speedup b_() for short string for Python3", which suggests the current functionality is deliberate.
The code tries latin-1 and if there is an exception uses utf-8. Given the rise of utf-8, I think we should just use utf-8.
@j-t-1 I recommend you to further dig into the commit history if unsure about specific cases - the latin-1
workaround has been required for some files. Doing a "blame" shows you that the corresponding change is from #463 and there already has been a request for an example file about two months ago: https://github.com/py-pdf/pypdf/pull/463#issuecomment-2067587535
@pubpub-zz
Line 226 of test_generic.py:
# to test latin-1 aka stdencoding
Is this equivalent to this (and if so we could change it)?
# to test PDFDocEncoding (latin-1)
Aka =also known as Your rewording is correct
It is non-obvious why we are only caching strings of length zero (the empty string) and of length one.
There is a need to add comments to this function.