mity / md4c

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
MIT License
756 stars 138 forks source link

Add mention links extension #171

Open niblo opened 2 years ago

niblo commented 2 years ago

This adds support for mention links.

codecov[bot] commented 2 years ago

Codecov Report

Merging #171 (3f355de) into master (7f05330) will increase coverage by 0.04%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #171      +/-   ##
==========================================
+ Coverage   94.33%   94.37%   +0.04%     
==========================================
  Files           3        3              
  Lines        3088     3112      +24     
==========================================
+ Hits         2913     2937      +24     
  Misses        175      175              
Impacted Files Coverage Δ
src/md4c-html.c 95.31% <100.00%> (+0.11%) :arrow_up:
src/md4c.c 94.27% <100.00%> (+0.03%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7f05330...3f355de. Read the comment docs.

niblo commented 2 years ago

I'm getting a Segmentation fault: 11 when integrating this in an application.

niblo commented 2 years ago

I'm no longer getting a segmentation fault. It was something downstream.

mity commented 2 years ago

Hello, sorry for the delayed answer.

Will take a look in upcoming days, this looks as a something bigger so I don't want to rush this too much.

mity commented 2 years ago

Just tried it. Seems to crash more or less with any input containing @ now (when the extension is enabled).

Here for the input @x:

(gdb) bt
#0  0x0040d0f9 in md_process_inlines (ctx=0x64faa8, lines=0x671510, n_lines=1) at ../src/md4c.c:4318
#1  0x0040e2a8 in md_process_normal_block_contents (ctx=0x64faa8, lines=0x671510, n_lines=1) at ../src/md4c.c:4639
#2  0x0040ea42 in md_process_leaf_block (ctx=0x64faa8, block=0x671508) at ../src/md4c.c:4818
#3  0x0040ed86 in md_process_all_blocks (ctx=0x64faa8) at ../src/md4c.c:4900
#4  0x0041246e in md_process_doc (ctx=0x64faa8) at ../src/md4c.c:6333
#5  0x0041262d in md_parse (
    text=0x672dc8 "@x­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş"..., size=2, parser=0x64fce4, userdata=0x64fd08) at ../src/md4c.c:6401
#6  0x00403cef in md_html (
    input=0x672dc8 "@x­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş"..., input_size=2, process_output=0x401e59 <process_output>, userdata=0x64fe60,
    parser_flags=32768, renderer_flags=5) at ../src/md4c-html.c:580
#7  0x00401f87 in process_file (in=0x760c4660 <msvcrt!_iob+96>, out=0x760c4620 <msvcrt!_iob+32>)
    at ../md2html/md2html.c:144
#8  0x004026fe in main (argc=3, argv=0x672c78) at ../md2html/md2html.c:380
niblo commented 2 years ago

Just tried it. Seems to crash more or less with any input containing @ now (when the extension is enabled).

Here for the input @x:

(gdb) bt
#0  0x0040d0f9 in md_process_inlines (ctx=0x64faa8, lines=0x671510, n_lines=1) at ../src/md4c.c:4318
#1  0x0040e2a8 in md_process_normal_block_contents (ctx=0x64faa8, lines=0x671510, n_lines=1) at ../src/md4c.c:4639
#2  0x0040ea42 in md_process_leaf_block (ctx=0x64faa8, block=0x671508) at ../src/md4c.c:4818
#3  0x0040ed86 in md_process_all_blocks (ctx=0x64faa8) at ../src/md4c.c:4900
#4  0x0041246e in md_process_doc (ctx=0x64faa8) at ../src/md4c.c:6333
#5  0x0041262d in md_parse (
    text=0x672dc8 "@x­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş"..., size=2, parser=0x64fce4, userdata=0x64fd08) at ../src/md4c.c:6401
#6  0x00403cef in md_html (
    input=0x672dc8 "@x­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş"..., input_size=2, process_output=0x401e59 <process_output>, userdata=0x64fe60,
    parser_flags=32768, renderer_flags=5) at ../src/md4c-html.c:580
#7  0x00401f87 in process_file (in=0x760c4660 <msvcrt!_iob+96>, out=0x760c4620 <msvcrt!_iob+32>)
    at ../md2html/md2html.c:144
#8  0x004026fe in main (argc=3, argv=0x672c78) at ../md2html/md2html.c:380

Rebased on the latest master, I'm not getting any errors when running the tests. This is on OS X. I see you're running on Windows. Do any of the checks (Travis CI, codecov, ...) run the test suite?

mity commented 2 years ago

Rebased on the latest master, I'm not getting any errors when running the tests. This is on OS X. I see you're running on Windows.

Still crashing on Windows.

But I can see that if I move the block of lines 4315 - 4318 after that following mention-related if, the crash goes away. That opener/closer arithmetics likely makes no sense in the mention case (it has no real closer, right?).

I have quite old gcc version on windows, so I can imagine newer compiler versions reorder that code as an optimization because the mention branch block does not depend on those variables and thus hide the bug effectively.

Do any of the checks (Travis CI, codecov, ...) run the test suite?

At this time, Travis only (linux) does the tests. Not sure how easy it would be to enable them on AppVeyor (windows) as it requires python but I hope it should be possible.

mity commented 2 years ago

Update: On Linux the problem is there too, only it does not manifest by the crash for some reason. Whenever running it under valgrind, enabling mentions generates invalid reads from unallocated memory.

niblo commented 2 years ago

But I can see that if I move the block of lines 4315 - 4318 after that following mention-related if, the crash goes away. That opener/closer arithmetics likely makes no sense in the mention case (it has no real closer, right?).

Which lines exactly?

mity commented 2 years ago

Which lines exactly?

These 4 lines make no sense to me in the case of input made of only mentions. That pointer arithmetic leads to the problem. There must be some logic added makeing that code work for links and auto-links but not to be executed for the case of mention which seem to have no closer (or alternatively you need to change the logic that some virtual empty closer marks are created even for the mentions).

https://github.com/niblo/md4c/blob/3f355de536520791c844c2fa95041b897ae7bc44/src/md4c.c#L4315-L4318

niblo commented 2 years ago

Which lines exactly?

These 4 lines make no sense to me in the case of input made of only mentions. That pointer arithmetic leads to the problem. There must be some logic added makeing that code work for links and auto-links but not to be executed for the case of mention which seem to have no closer (or alternatively you need to change the logic that some virtual empty closer marks are created even for the mentions).

https://github.com/niblo/md4c/blob/3f355de536520791c844c2fa95041b897ae7bc44/src/md4c.c#L4315-L4318

I'm having some issues compiling on Windows. How do you do it?

mity commented 2 years ago

I'm having some issues compiling on Windows. How do you do it?

I'm generally using mingw-w64, or more specifically very old MSYS (not MSYS2) environment with gcc-toolchain built from https://github.com/niXman/mingw-builds/ and I'm build with Cmake+Ninja. Unfortunately that can be hard-to-replicate environment.

That said though, you should be able to build with MSYS 2 (can be downloaded from https://mingw-w64.org) or directly with any recent version of MS Visual Studio (afaik, it now supports building CMake-based projects directly). Or you can generate MSVC solution manually by CMake as appveyor.xml does for the purposes of CI.

However note I don't think that making some cosmetic change preventing the most obvious crashes is good enough approach here. The pointed piece of code shows this PR makes a fundamental difference between how the mentions are implemented and how other permissive links are.

For the permissive e-mail auto-links we add an extra dummy mark for every potential valid @ (when handling @ in md_collect_marks()), and in case we decide later to resolve it as an actual link we turn that dummy mark into a virtual (zero-length) closer mark. That means all the later link processing works quite independently on whether it was a full markdown link or permissive auto-link because they are encoded similarly in the MD_CTX::marks[].

I don't think it's a good idea to depart from this approach for mention links. It could lead to difficult-to-solve collisions of such radically different approaches in all phases of the processing. So please take a look whether mentions could work the same way: In ideal case you would then need to distinguish only at the last moment what constants to use when calling the callback functions for one or the other. I think it should be possible.

ec1oud commented 2 years ago

x-mention-data-target is an html tag already in use?

mity commented 2 years ago

x-mention-data-target is an html tag already in use?

No. Only in this PR. And even when/if it gets merged you would see it only when enabled with an extension flag.