jzillmann / pdf-to-markdown

A PDF to Markdown converter
https://pdf2md.morethan.io
MIT License
1.2k stars 195 forks source link

Is it possible to detect highlighted sections (annotations) on a pdf and preserve that in md? #16

Open shrvenkataraman opened 4 years ago

jzillmann commented 4 years ago

So you want the whole PDF content and the highlights somehow marked Eg. as code or italic !? Or you want rather to just extract the highlights ?

Interesting feature anyway, but haven't looked into it so far..

sslHello commented 2 years ago

Hi, I don't know which issues @shrvenkataraman means. I have just tested you converter. I do have some issues that go in the same direction: As far as fist tests show here that highlighting (e.g. bold via ... ) is broken at the end of a PDF line especially in lists, this generates outputs of this kind (added to show carriage returns of the PDF and md output):

**- element_1_text_1**<cr>
  **element_1_text_2<cr>
- element_2_text_1**<cr>
  **element_2_text2<cr>
- last_element_text_1**<cr>
  **last_element_text_2**

Could you change this to something like this, please:

- **element_1_text_1<cr>
  element_1_text_2<cr>
- **element_2_text_1**<cr>
  element_2_text2**<cr>
- **last_element_text_1<cr>
  last_element_text_2**

I do hope this is one of the cases that he may have meant, too.

Thank you so much for your converter! Cheers Torsten

darkcheftar commented 2 years ago

image I guess @shrvenkataraman is talking about something like this