opengovsg / pdf2md

A PDF to Markdown converter
https://www.npmjs.com/package/@opendocsg/pdf2md
MIT License
210 stars 40 forks source link

Every line is transformed as header element in markdown #74

Open saravmajestic opened 1 year ago

saravmajestic commented 1 year ago

Describe the bug Thanks for this library. Very much helpful. I am seeing a weird issue. When parsing this PDF, each and every line is transformed as a header element instead of sentences/paragraphs. This issue is happening with original repo as well. Tested here: https://pdf2md.morethan.io/

To Reproduce Steps to reproduce the behavior:

  1. call @opendocsg/pdf2md in cmd line with the above file as input
  2. Check the output

Expected behavior Some of the text in the pdf, for ex: Selecting the “right” amount of information to include in a summary is a difficult task. A good... should not be treated as header

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context Since this issue exists in original repo, it will be great if you can point me how to resolve this issue. Appreciated!

graylewis commented 4 months ago

I'm having this same issue on the latest version