Open mikey21211 opened 6 years ago
Hello @mikey21211
Do you have a use case to understand what exactly you want to achieve or avoid?
I probably did not understood your goal, but you can always select the content of interest in the resulting XML/TEI and ignore some elements during the XML parsing... When processing a PDF, it makes sense to consider the possibility of figures to distinguish those elements/blocks from the the rest of the content body.
That makes sense what you're saying, and I think that the figures might be throwing off my result. Is there a way to ignore the figures? Or rather, strip away the images within my pdf before the parse?
On Feb 20, 2018 12:40 AM, "Patrice Lopez" notifications@github.com wrote:
Hello @mikey21211 https://github.com/mikey21211 Do you have a use case to understand what exactly you want to achieve or avoid? I probably not understood your goal, but you can always select the content of interest in the resulting XML/TEI and ignore some elements during the XML parsing... When processing a PDF, it makes sense to consider the possibility of figures to distinguish those elements/blocks from the the rest of the content body.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kermitt2/grobid/issues/288#issuecomment-366881531, or mute the thread https://github.com/notifications/unsubscribe-auth/Ai8lr5aBz934n2DofUITrHX6AkerEevnks5tWmjRgaJpZM4SLALV .
Is it possible to use the TEI and parse the full text but ignore the figures/images within the PDF? Thanks