Open mrelg opened 2 years ago
This bug is due to google itself, but it should be alleviated.
Tesseract is overkill for detecting "Image not available" and it can fail as demonstrated in this bug (see picture)
This is classified as a regular page.
Beater way of detecting "image not available" is thru &jscmd=click3 "flags"
https://books.google.com/books?id=xxxxxxxxxxxx&hl=en&pg=PAn&jscmd=click3 returns something like this: {"page":[{"pid":"PA...","src":"https://books.google.com/.....","flags":0,"order":...,"uf":"https://..."}, ... { ...
image not available: "flags":8 (presumably !=0 ) image available: "flags":0
That is an interesting observation, can you make a pull request with the changes so that we can experiment.
This bug is due to google itself, but it should be alleviated.
Tesseract is overkill for detecting "Image not available" and it can fail as demonstrated in this bug (see picture)
This is classified as a regular page.
Beater way of detecting "image not available" is thru &jscmd=click3 "flags"
https://books.google.com/books?id=xxxxxxxxxxxx&hl=en&pg=PAn&jscmd=click3 returns something like this: {"page":[{"pid":"PA...","src":"https://books.google.com/.....","flags":0,"order":...,"uf":"https://..."}, ... { ...
image not available: "flags":8 (presumably !=0 ) image available: "flags":0