wojtekmaj / react-pdf

Display PDFs in your React app as easily as if they were images.
https://projects.wojtekmaj.pl/react-pdf
MIT License
9.24k stars 877 forks source link

customTextRenderer is not called #1593

Closed mixeden closed 11 months ago

mixeden commented 1 year ago

Before you start - checklist

Description

Well, customTextRenderer is not called!

My guess is because of this particular line: https://github.com/wojtekmaj/react-pdf/blob/v7.3.3/src/Page/TextLayer.tsx#L222

It seems that this layer has some marked content, but it is above the text I'm talking about (see "steps to reproduce"). It should not affect the text being rendered in any way.

Steps to reproduce

  1. Load this https://arxiv.org/pdf/1905.09263.pdf, navigate to page 4, check if your customTextRenderer is called on a text that starts with "sequence in speech tasks. We evaluate", for example.

Expected behavior

customTextRenderer is called

Actual behavior

customTextRenderer is not called

Additional information

No response

Environment

picasocro1 commented 1 year ago

I have exactly the same problem. For this particular pdf defined customTextRenderer is not called.

The method was executed correctly on the other pdf I worked on so far (using exactly the same code).

Environment

mixeden commented 1 year ago

What is even more funny, it works on the majority pages of the PDF I attached in the reproduction section except a page number 4. I guess this is because this particular page had elements with class name "markedContent" in it somewhere to the top. Please check if your PDF from Volvo also has elements with the same class name in problematic places

frontendphil commented 1 year ago

Yup, I can confirm that marked content seems to have something to do with it. AFAICT, the moment this lib sees .markedContent, it assumes that all text of that layer is inside the .markedContent element. However, in our case, these are just empty blocks. https://github.com/wojtekmaj/react-pdf/blob/main/packages/react-pdf/src/Page/TextLayer.tsx#L207

hdwatts commented 11 months ago

Yes having this same issue with markedContent!