Hi, I think I spotted an insidious bug in get_page_output .
In line 625, the code reads:
md_string += output_images(None, tab_rects, None)
whereas I reckon it should be:
md_string += output_images(page, None, vg_clusters)
I had incorrect results when an image in a page (typical on pdf from pptx) does not have any text below it. The extracted text didn't include the image tag, and the image wasn't saved to file. Changing the above line fixed the issue.
Thank you for your code BTW!
Hi, I think I spotted an insidious bug in get_page_output . In line 625, the code reads: md_string += output_images(None, tab_rects, None)
whereas I reckon it should be: md_string += output_images(page, None, vg_clusters)
I had incorrect results when an image in a page (typical on pdf from pptx) does not have any text below it. The extracted text didn't include the image tag, and the image wasn't saved to file. Changing the above line fixed the issue. Thank you for your code BTW!