Open kshpytsya opened 2 years ago
Kind of? This ambiguity is not intended, but it looks to me like it's going to end up doing the right thing anyway.
If the first if
is executed, then we're going to get the same result in the second if
for self._has_text
too, unless there are bugs in either implementation. The former uses pdfminer.six to do a much slower analysis that figures out where all the text is on the page; the latter checks for presence of vector, text and images. So the result of that boolean is going to be equal in both cases.
If you have something that demonstrates a likely issue here please let me know.
I may be missing something but it seems that the value of
self._has_text
set in this section of code:https://github.com/ocrmypdf/OCRmyPDF/blob/5c6030960945fe299291fa134cff35c86a644b9f/src/ocrmypdf/pdfinfo/info.py#L779-L788
is always overwritten here:
https://github.com/ocrmypdf/OCRmyPDF/blob/5c6030960945fe299291fa134cff35c86a644b9f/src/ocrmypdf/pdfinfo/info.py#L804-L822