vsch / flexmark-java

CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.
BSD 2-Clause "Simplified" License
2.29k stars 271 forks source link

how to parse markdown's task list? #347

Closed NegassaB closed 5 years ago

NegassaB commented 5 years ago

I'm using your library as my choice of markdown parser but I have seem hit an issue. It's not properly parsing the tast list feature of markdown. It simply will not "[ ]" or "[x]" correctly as it should. I've even tried it by appending "-" to the begining and after moving the caret a space farther and adding "[ ]" , like "- [ ]", t still doesn't work. What's the solution?

vsch commented 5 years ago

@NegassaB, task list uses an extension flexmark-ext-gfm-tasklist you can take a look at the TaskListAttributeProviderSample.java for options declaration of extensions. The sample adds custom attributes which you do not need so you can eliminate the SampleExtension from the extension list.

NegassaB commented 5 years ago

@vsch I tried following the sample you sent (with the exception of the sampleExtension) and it still prints it out with a <p> tag and not as task list. What am I missing?

vsch commented 5 years ago

@NegassaB, can you please post your source code here with the markdown you are using.

NegassaB commented 5 years ago

@vsch sure, it's stated below.

static String convertToMarkdown(String writtenMarkdownText) {
    MutableDataHolder options = new MutableDataSet();
    options.set(Parser.EXTENSIONS, Collections.singletonList(TaskListExtension.create()));
    options.set(TaskListExtension.ITEM_DONE_MARKER, "");
    options.set(TaskListExtension.ITEM_NOT_DONE_MARKER,"");

    Parser parser = Parser.builder(options).build();
    Node document = parser.parse(writtenMarkdownText);
    HtmlRenderer renderer = HtmlRenderer.builder().build();
    return renderer.render(document);
  }

And this is the method that displays the output as an HTML on JEditorPane.

static JEditorPane displayMarkdown(String writtenMarkDownText) {
    JEditorPane markdownPane = new JEditorPane("text/html", DisplayMarkdown.convertToMarkdown(writtenMarkDownText));
    markdownPane.setEditable(false);
    return markdownPane;
  }
vsch commented 5 years ago

@NegassaB, the ITEM_DONE_MARKER and ITEM_NOT_DONE_MARKER being set to "" outputs task list items as plain items. In the sample this is done because the display of task list items is done via CSS list item classes which are set by custom attribute provider.

If you leave these two options with their defaults (ie. don't set them) you will get the default HTML with input checkboxes for the list items.

What HTML do you get for the markdown:

* [ ] Open item
* [x] Closed item

And what HTML do you expect?

vsch commented 5 years ago

@NegassaB, you have to keep in mind that Swing JEditorPane HTML is not 100% browser HTML and has many limitations so you may need to tweak HTML generation to get decent results.

Best way to do this is to play with HTML manually until you get the results you want then configure the library or custom render some nodes to match the HTML you want generated.

NegassaB commented 5 years ago

@vsch so I removed the options.set() method and it works but only when I preceed it with *. If I don't preceed it with * or -, it outputs it as [ ] Open item [x] Closed item. Is this how it's supposed to be?

NegassaB commented 5 years ago

@vsch about the JEditorPane, am working on moving the entire project to JavaFX. It seems like a reliable partner than Swing. But how do I fiddle with the your library? I've been trying to understand how it works and without a solid documentation in place, it's really difficult.

vsch commented 5 years ago

@NegassaB, git hub task list items are list items which begin with [ ] or [x], so the *, - or + has to be there to mark the text as a list item.

From the flexmark-ext-gfm-tasklist extension's javadoc/overview.html file:

flexmark-java extension for GFM style task list items

task list items from list items whose text begins with [ ], [x] or [X]

If you want task lists without the proper list item prefix then it would require implementing a custom extension to process the task list items without the list prefix and to create your own custom nodes to represent them since the current task list items expect to have the list prefix.

As for documentation there is a wiki with overview of the parsing process https://github.com/vsch/flexmark-java/wiki/Writing-Extensions and description of extensions in https://github.com/vsch/flexmark-java/wiki/Writing-Extensions

You also have a fair amount of samples to show various types of customizations in the flexmark-java-samples module.

The rest requires working with the source code.

Being an open source project means maintenance is done on my own time after earning a living. Which means I have to prioritize my time between all maintenance tasks including documentation.

If you have any ideas on how to improve the documentation for new users, I welcome any PRs with documentation improvements.

NegassaB commented 5 years ago

@vsch I will take a look at the wikis' and try to come up with some ideas. I thank you for all your help. And you're doing great, keep at it.