sirthias / pegdown

A pure-Java Markdown processor based on a parboiled PEG parser supporting a number of extensions
http://pegdown.org
Apache License 2.0
1.29k stars 218 forks source link

How to Detect new lines #247

Open omidp opened 7 years ago

omidp commented 7 years ago

Hi,

I'm trying to create a custom pegdown serializer by implementing org.pegdown.ast.Visitor and I have a problem with parsing new line. for example I can't distinguish between this two paragraphs

This is test.

This is

test.

none of org.pegdown.ast.Visitor interface methods can not detect new line. is there anyway to catch this ?

I'm new to pegdown. Thanks in advanced.

vsch commented 7 years ago

@omidp, pegdown AST is not consistent for handling new lines. Sometimes they are part of the node text, sometimes they are excluded and you need to look for them in the ranges of text which are not part of the node. No easy solution that I can think of.

If you are new to pegdown, I would recommend trying another markdown parser. pegdown has a lot of idiosyncrasies and it is not actively maintained, has lot of quirks and bugs in the AST and some very nasty pathological input parsing issues, give this to it and watch it take a long time to parse it [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[.

My recommendation if you don't have any code already invested in pegdown, then use something, commonmark-java is an excellent choice if you don't need source element based AST (it generates output based AST), don't need a lot of extensions and can work with CommonMark markdown format https://github.com/atlassian/commonmark-java, Java 1.7 and android compatible.

If you need detailed source based AST and can handle Java 1.8 language level libraries and don't need android compatibility you can try the parser I wrote to replace pegdown in my Markdown Navigator plugin for IntelliJ IDEs: https://github.com/vsch/idea-multimarkdown. The parser project is https://github.com/vsch/flexmark-java. It is CommonMark 0.27 compliant but has parser configuration options to emulate list indentation rules used by: markdown.pl, MultiMarkdown (like pegdown 4 space ndents) and kramdown. The only extensions that pegdown has that I did not yet implement are: typographic quotes, smarts and definition lists. The rest of the extensions are available, with some extra ones that pegdown does not have.