opengovsg / pdf2md

A PDF to Markdown converter
https://www.npmjs.com/package/@opendocsg/pdf2md
MIT License
195 stars 39 forks source link

Identify Styles #55

Open flywire opened 3 years ago

flywire commented 3 years ago

Different components of documents can often be identified by text style, eg code in the following listing could be identified with attributes such as indents, font size and style, spacing. Fonts and size clearly identify the blocks in debug but they are not marked up in the file.

Interesting, loaded another chapter and was very different; code blocks are clearly identified and all normal lines as headers.


Normal text before code block.

const { expect } = require('chai')

Text following code block.