Closed DDAndyChen closed 5 years ago
Yep, good catch, likely a bug
Hi @xoofx,
I'm interested in fixing this bug. I have cloned the repository and using Visual Studio 2017.
I'm getting the following Build error: "The specified language targets for uap10.0 is missing. Ensure correct tooling is installed.\r\nMissing File: C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\MSBuild\Microsoft\WindowsXaml\v15.0\Microsoft.Windows.UI.Xaml.CSharp.targets Markdig C:\Users\XXX\.nuget\packages\msbuild.sdk.extras\1.0.9\build\netstandard1.0\MSBuildSdkExtras.Common.targets"
Does the Solution absolutely need the UWP tooling enabled to be able to Build successfully?
Does the Solution absolutely need the UWP tooling enabled to be able to Build successfully?
Yes
Hi @xoofx,
Thanks for earlier.
After spending some time and having to manually install the UAP version required by the project using the Visual Studio Installer, finally managed to build the solution successfully.
Spent some time looking at the code and common mark spec too.
[Removed the test scenario from the comment as I have realised that it was incorrect]
Any plans on fixing this bug? :) Or is there a work-around?
Any plans on fixing this bug? :) Or is there a work-around?
No, I don't have any spare time left, PR welcome
Sorry to hear that.
It's easily reproducible using the following test (the top one for HTML is already there, I just replicated it for Markdown.ToPlainText()
to reproduce the issue):
[TestFixture]
public class TestConfigureNewLine
{
[Test]
[TestCase(/* newLineForWriting: */ "\n", /* markdownText: */ "*1*\n*2*\n", /* expected: */ "<p><em>1</em>\n<em>2</em></p>\n")]
[TestCase(/* newLineForWriting: */ "\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "<p><em>1</em>\n<em>2</em></p>\n")]
[TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\n*2*\n", /* expected: */ "<p><em>1</em>\r\n<em>2</em></p>\r\n")]
[TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "<p><em>1</em>\r\n<em>2</em></p>\r\n")]
[TestCase(/* newLineForWriting: */ "!!!" , /* markdownText: */ "*1*\n*2*\n", /* expected: */ "<p><em>1</em>!!!<em>2</em></p>!!!")]
[TestCase(/* newLineForWriting: */ "!!!" , /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "<p><em>1</em>!!!<em>2</em></p>!!!")]
public void TestHtmlOutputWhenConfiguringNewLine(string newLineForWriting, string markdownText, string expected)
{
var pipeline = new MarkdownPipelineBuilder()
.ConfigureNewLine(newLineForWriting)
.Build();
var actual = Markdown.ToHtml(markdownText, pipeline);
Assert.AreEqual(expected, actual);
}
[Test]
[TestCase(/* newLineForWriting: */ "\n", /* markdownText: */ "*1*\n*2*\n", /* expected: */ "1\n2\n")]
[TestCase(/* newLineForWriting: */ "\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1\n2\n")]
[TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\n*2*\n", /* expected: */ "1\r\n2\r\n")]
[TestCase(/* newLineForWriting: */ "\r\n", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1\r\n2\r\n")]
[TestCase(/* newLineForWriting: */ "!!!", /* markdownText: */ "*1*\n*2*\n", /* expected: */ "1!!!2!!!")]
[TestCase(/* newLineForWriting: */ "!!!", /* markdownText: */ "*1*\r\n*2*\r\n", /* expected: */ "1!!!2!!!")]
public void TestPlainTextOutputWhenConfiguringNewLine(string newLineForWriting, string markdownText, string expected)
{
var pipeline = new MarkdownPipelineBuilder()
.ConfigureNewLine(newLineForWriting)
.Build();
var actual = Markdown.ToPlainText(markdownText, pipeline);
Assert.AreEqual(expected, actual);
}
}
You'll notice this new test fails.
I noticed it's parsing a single ParagraphBlock
, and the LineReader
parses 2 lines (*1*
and *2*
), whilst dropping/ignoring the CRLF characters at the end. (LineReader.cs
ln 55)
I've no idea why the last occurrence of the new line character is discarded when parsing to plain text, whereas it's not when parsing to html... Trying to spot the difference.
If you have any idea/pointers where to look at, please share :)
The problem is not in the parser but in the renderer. The HTML renderer is used today to render as a text. There are likely a few places in the code where we don't output newline, while the HTML doesn't care, the text version would care. I don't like the idea of using the HTML renderer to output plain text, but for some that was the quickest solution. I don't think it is much work in the codebase, but you need to dig into it (HtmlRenderer is quite simple, so it should not be difficult)
Hi @xoofx - thanks for merging pull request. Do you have any timelines for publishing a new version to nuget? Thanks.
Wondering if this fix has been published to nuget yet? On a project that definitely needs it. :) Thanks.
Last NuGet is from 3 months ago so definitely yes.
By using Markdown.ToPlainText method, the following markdown
is converted to
some line breaks are dropped, so "Smith" and "Start" become one word, the same as "2018-02-07" and "Position". I think it the output would be better in this:
that is