Add support for normalize - markdown document rountrip

xoofx commented 7 years ago

Followup of #17 and #32 and #154

This is an issue to track the remaining work to handle full markdown document round-trip

[x] Support all CommonMark core components (#154)
- [x] Handle escape characters (by 6717be5)
- [x] Handle link reference definitions
[ ] Add normalize options (limit text width....etc.) - wip
- [x] Add options for basic elements (by de5ed119633b029d083c2e4a112722e4a03a84aa)
- [ ] Add support for text width limit
[ ] Support all extensions (first focus on Github-GFM markdown, then the rest)
- [ ] Abbreviations
- [ ] AutoIdentifiers
- [x] AutoLinks
- [ ] Bootstrap
- [ ] Citations
- [ ] CustomContainers
- [ ] DefinitionLists
- [ ] Diagrams
- [ ] Emoji
- [ ] EmphasisExtras
- [ ] Figures
- [ ] Footers
- [ ] Footnotes
- [ ] GenericAttributes
- [ ] Hardlines
- [x] JiraLinks
- [ ] ListExtras
- [ ] Mathematics
- [ ] MediaLinks
- [ ] NonAsciiNoEscape
- [ ] NoRefLinks
- [ ] PragmaLines
- [ ] SelfPipeline
- [ ] SmartyPants
- [ ] Table - Pipe (GFM)
- [ ] Table - Grid
- [x] TaskLists (GFM)
- [ ] Yaml

cc: @tthiery

xoofx commented 7 years ago

So I have fixed a few cases with list for better handling loose list. I have added support for escape characters. I have also started to add NormalizeOptions to control a bit the markdown output.

The limit text width will require a bit more infrastructure work on the TextRendererBase, still nothing done for this (wip)

xoofx commented 7 years ago

I have added support for HtmlBlock renderer. The normalize is able to process the CommonMark specs and the output is almost identical to the original which is already a good start.

duncanawoods commented 5 years ago

TaskLists are marked as done but normalise actually strips the task syntax:

    static class Program
    {
        public static MarkdownDocument Parse(string markdown) =>
            Markdown.Parse(
                markdown, 
                new MarkdownPipelineBuilder()
                    .UseTaskLists()
                    .Build());

        public static string ToText(this MarkdownObject o)
        {
            using var stringWriter = new StringWriter();

            new NormalizeRenderer(stringWriter)
                .Render(o);

            return stringWriter.ToString();
        }

        static void Main(string[] args) =>
            Console.WriteLine(
                Program.Parse("- [ ] example task")
                    .ToText());
    }

prints

-  example task

duncanawoods commented 5 years ago

Ah, I see, have to use the same pipeline:

    static class Program
    {
        public static MarkdownPipeline Pipeline =>
            new MarkdownPipelineBuilder()
                .UseTaskLists()
                .Build();

        public static MarkdownDocument Parse(string markdown) =>
            Markdown.Parse(
                markdown, 
                Program.Pipeline);

        public static string ToText(this MarkdownObject o)
        {
            using var stringWriter = new StringWriter();

            var renderer = new NormalizeRenderer(stringWriter);

            Program.Pipeline.Setup(renderer);

            renderer.Render(o);

            return stringWriter.ToString();
        }

        static void Main(string[] args) =>
            Console.WriteLine(
                Program.Parse("- [ ] example task")
                    .ToText());
    }

tthiery commented 5 years ago

@duncanawoods so your issue is resolved?

duncanawoods commented 5 years ago

Yep - thanks :)

My feedback is - the renderer API could be clearer e.g. taking the pipeline as an argument. The only way I found out I need to setup a pipeline is reading the internals looking to create a PR!

tthiery commented 5 years ago

I do not know how @xoofx generally handles it here, maybe discuss the API change ahead of time in a separate issue. I think PRs are generally welcome, also here in the normalizer topic.

MihaZupan commented 5 years ago

I think this issue was meant for tracking the progress of the Normalisation rendering implementation.

Issue proposals and PRs are of course welcome!

raffaeler commented 4 years ago

I add this case here as it looks similar to the one posted in the comments:

Code:

using var stringWriter = new StringWriter();
var renderer = new NormalizeRenderer(stringWriter);
GetPipeline().Setup(renderer);

renderer.Render(doc);
Console.WriteLine(stringWriter.ToString());

Input string:

# File number 2

Hello, world

<!-- comment 1 -->
<!-- 
{
    "key0":[
        "A"
    ],
    "key1":[
        "x", "y", "z", "t"
    ],

    "Key3": "key 3"
}

-->

Output string:

# File number 2

Hello, world

[]:

The error is of course the []: at the end of the output string. My goal is:

parse a markdown (done)
extract the comment containing a json (done)
remove the comment containing a json from the markdown document (done)
save back the markdown string (nope)

Since I just started working with this library, I may be doing something wrong ...

TIA

xoofx commented 4 years ago

The error is of course the []: at the end of the output string.

Yeah... the problem is that the NormalizeRenderer has never been finished. It's probably several hours of work to fix the remaining errors just for core CommonMark (without extensions). Don't have time nor a personal interest for this feature at the moment, so unless someone is willing to do that work, the NormalizeRenderer is not really usable.

raffaeler commented 4 years ago

Hi @xoofx nice to see you here as well instead of some random city around the globe! :)

I am not really interested in NormalizeRenderer specifically, but just getting back the text for the markdown that I have modified by removing a block. How can I get back the text of the modified document?

Thank you

xoofx commented 4 years ago

Hi @xoofx nice to see you here as well instead of some random city around the globe! :)

Oops, sorry, should have started by saying hello Raffaele! 😉

How can I get back the text of the modified document?

That's what the NormalizeRenderer was supposed to do (not only normalize but "roundtrip" even if this is a much longer way to support full rountrip)

raffaeler commented 4 years ago

:)

That's what the NormalizeRenderer was supposed to do ...

argh, this was unexpected. I gave the roundtrip for granted when I started looking for a markdown library :-/

Do you have any suggestion to get rid of the problem for "just" going back to the text? Looping on blocks and their descendants recursively? TIA

xoofx commented 4 years ago

argh, this was unexpected. I gave the roundtrip for granted when I started looking for a markdown library :-/

Do you have any suggestion to get rid of the problem for "just" going back to the text? Looping on blocks and their descendants recursively?

That's exactly what NormalizeRenderer is supposed to do: go through the nodes recursively and print the equivalent Markdown, there is no magic in it.

For the specific error you have, it's probably a small fix, but I can't have a look at it.

tthiery commented 4 years ago

@xoofx sorry that I left this project mid-status done. Work life increased a lot and my private project use case vanished.

raffaeler commented 4 years ago

/cc @xoofx I just found the bug is on the Parser and not on the Normalizer. The parser incorrectly add a LinkReferenceDefinitionGroup at the end of the document

Later on, the Normalizer renders (empty) the []: tags which stand for the link and do not derive from the html comments.

Given the instability of the parser, unfortunately I have no choice but searching for a new library :(

xoofx commented 4 years ago

I just found the bug is on the Parser and not on the Normalizer.

I don't think it's a bug in the parser. I believe it is because the LinkReferenceDefinitionGroupRenderer was incorrectly added in the first place (while there is no renderer like this for the Html renderer).

raffaeler commented 4 years ago

@xoofx I don't get why the renderer should be involved at that time. The screenshot is what I see on the document right after parsing the markdown. I didn't render anything at that moment.

xoofx commented 4 years ago

The screenshot is what I see on the document right after parsing the markdown.

I know and it is "correct". The LinkReferenceDefinitionGroup is added automatically by the system. Don't know why there is an item in it (that part is maybe a bug here), but LinkReferenceDefinitionGroup is used for book-keeping links (backward and forward) and made accessible through the AST. In the Html renderer, this element is not rendered and it should not be rendered also in normalize. That's more the actual bug.

xoofx commented 4 years ago

As a workaround, without modifying the renderer, you could remove LinkReferenceDefinitionGroup after the document is parsed..

raffaeler commented 4 years ago

Thanks anyway @xoofx I was already trying another solution.

I have to test more cases, but apparently the following if that I added in LinkReferenceDefinitionRenderer resolves the problem:

        protected override void Write(MdRenderer renderer, LinkReferenceDefinition linkDef)
        {
            if (string.IsNullOrEmpty(linkDef.Url))
            {
                return;
            }
        // ...

Note: MdRenderer is the class I used to replace the renderer.

xoofx commented 4 years ago

Note: MdRenderer is the class I used to replace the renderer.

It won't work when you will have links in your document, because LinkReferenceDefinitionGroup is still rendered and the links in it, while it should not.

You can remove the LinkReferenceDefinitionGroupRenderer from the normalize renderer directly which is probably a more correct workaround: normalizeRenderer.ObjectRenderers.TryRemove<LinkReferenceDefinitionGroupRenderer>();

raffaeler commented 4 years ago

You can remove the LinkReferenceDefinitionGroupRenderer

You mean LinkReferenceDefinitionRenderer, right? ... I did it and it works. Not sure if this change may have side-effects.

BTW LinkReferenceDefinitionGroupRenderer didn't work.

xoofx commented 4 years ago

You mean LinkReferenceDefinitionRenderer, right? ... I did it and it works. Not sure if this change may have side-effects. BTW LinkReferenceDefinitionGroupRenderer didn't work.

Oh right, actually both LinkReferenceDefinitionRenderer and LinkReferenceDefinitionGroupRenderer should be removed, otherwise RendererBase is still visiting the children of LinkReferenceDefinitionGroup (which is a ContainerBlock) Forgot that LinkReferenceDefinition is also not used for rendering, I added them by mistake in the NormalizeRenderer back in the days.

xoofx commented 4 years ago

I added them by mistake in the NormalizeRenderer back in the days.

Never mind, it seems that it was added on purpose for cases like this:

This is a [link][MyLink]

[MyLink]: http://company.com

In HTML they are not rendered, but in Markdown, they should be rendered. So the problem is likely the empty link that is put in the LinkReferenceDefinitionGroup

I haven't touched this code for the past 3 years, so yeah... it's old, and my memory cannot keep up at some point 😅

raffaeler commented 4 years ago

So, at the end, the if statement I added as for this comment should solve the problem.

xoofx commented 4 years ago

So, at the end, the if statement I added as for this comment should solve the problem.

Yeah, it's enough to workaround.

xoofx / markdig

Add support for normalize - markdown document rountrip #155