Open gaofeiseu opened 5 years ago
@gaofeiseu, I tried the HTML you gave and the converted markdown seems to be correct, the markdown is the first line the HTML is last:
![](//img.alicdn.com/tfscom/TB1mR4xPpXXXXXvapXXXXXXXXXX.jpg)
.
<img src="//img.alicdn.com/tfscom/TB1mR4xPpXXXXXvapXXXXXXXXXX.jpg" >
Can you create a small test with options which does not work for you?
You can use the sample as a starting point and add the configuration you use in your code:
@gaofeiseu, sorry, I just realized what you really wanted was to add "https:" prefix to the image URL if it is missing.
The easiest way to do this in the current implementation is to use the standard HTML parser to get the Markdown, then parse the Markdown and replace the URLs in the AST with what you want before passing the AST document node to formatter, which will output the changed Markdown.
The sample FormatterWithMods.java shows how to change the URLs in the AST so that the formatted Markdown has replaced URLs.
All you need to do is replace the logic in FormatterWithMods.java: Lines 68-71 with:
if (node.getPageRef().startsWith("/")) {
node.setUrlChars(PrefixedSubSequence.of("https:", node.getPageRef()));
node.setChars(SegmentedSequence.of(Arrays.asList(node.getSegmentsForChars())));
}
To have all URLs starting with /
prefixed with https:
@vsch thanks a lot for your patient!Use standard HTML parser to get Markdown from HTML content is what I have done.You mean I need continue to parse Markdown to HTML and replace URLs with similar method you had gave in demo code:FormatterWithMods.java. Then I still need to parse the HTML content after replacing to Markdown? I agree this will be a solution, but as you see, too many convert between HTML and Markdown I need to do in this solution. Is there other solution, more light weight, less convert, direct from HTML to markdown
@gaofeiseu, what you need to do is simply combine HTML to Markdown then parse the Markdown to AST, replace the URLs in the AST and render the AST as Markdown using the formatter. It is combining the two samples I mentioned into a single process.
If you take the modified FormatterWithMods you can see the needed steps: FormatterWithMods2.java
The current version of HTML to Markdown implementation is not extensible so there is no easy way to modify the markdown it generates. I am working on a new version that supports extensions similar to HTML Renderer and Markdown Formatter which will allow some customization to generated Markdown without needing to re-parse the markdown but this is not yet available.
@gaofeiseu, new module with extension API for HTML to Markdown conversion implemented.
See #313, last comment has a link to a sample which modifies some link URLs during conversion.
Is your feature request related to a problem? Please describe. Hi, I come from China, flexmark is really good tools, during my development, I found some problem. I need to convert html to markdown.But when I convert, some tag in html has unusual src like this
<img src="//img.alicdn.com/tfscom/TB1mR4xPpXXXXXvapXXXXXXXXXX.jpg" >
such src cannot convert to markdown and behavior correct.Describe the solution you'd like how can I modify src parse method in img tag in a extension options way. And get result like this
<img src="//abc.com/cde/efg.jpg" >
convert to![](https://abc.com/cde/efg.jpg)
Describe alternatives you've considered some extension options or already has some options I just ignore?
Additional context