mysticmind / reversemarkdown-net

ReverseMarkdown.Net is a Html to Markdown converter library in C#. Conversion is very reliable since HtmlAgilityPack (HAP) library is used for traversing the Html DOM
MIT License
270 stars 62 forks source link

Many HTML tags reserved unnecessary spaces #383

Closed doggy8088 closed 2 months ago

doggy8088 commented 5 months ago

There are many HTML tags such as p, span, div, ...etc. reserved unnecessary spaces that will lead to generate wrong Markdown document.

Here is my code snippet:

void Main()
{
    var html = """
<h1>Announcing SQL Server Data Tools (SSDT) for ARM64 Architecture in Visual Studio 17.10 Preview 2</h1>
<p>
            March 20th, 2024</p>
""";

    var config = new ReverseMarkdown.Config
    {
        // Include the unknown tag completely in the result (default as well)
        UnknownTags = Config.UnknownTagsOption.Drop,
        // generate GitHub flavoured markdown, supported for BR, PRE and table tags
        GithubFlavored = true,
        // will ignore all comments
        RemoveComments = true,
        // remove markdown output for links where appropriate
        SmartHrefHandling = true,
        ListBulletChar = '-',
        SuppressDivNewlines = true,
    };

    (new ReverseMarkdown.Converter(config)).Convert(html).Dump();
}

The output:

# Announcing SQL Server Data Tools (SSDT) for ARM64 Architecture in Visual Studio 17.10 Preview 2

             March 20th, 2024

I think the "spaces" before March 20th, 2024 should be removed.

The LINQPad query: https://share.linqpad.net/626vojje.linq

mysticmind commented 5 months ago

I think, this has to be made configurable, will take a look. Earlier someone posted an issue that they want everything intact hence some changes were made to not trim things around.

mysticmind commented 2 months ago

Fixed via 8329d3ea4b0e6456c3bc9d777f56cc77cedc4345