ije / md4w

A Markdown renderer written in Zig & C, compiled to WebAssymbly.
MIT License
50 stars 0 forks source link

Exposing parse utils #3

Closed pi0 closed 5 months ago

pi0 commented 5 months ago

Hi. I quickly made this tracker issue while writing https://github.com/unjs/automd/issues/32 to see if you are interested to also expose a simple parse util? (could be either stream or returning whole AST). This can be used as parser core in unjs/omark ❤️

ije commented 5 months ago

the md4c parser is pretty simple, it just receives 5 hooks:

basically we can create these hooks in host, although calling js functions in wasm module is not ideal, but yes i think i can do it. just don't know are these hooks enough for omark's goal?

pi0 commented 5 months ago

I am thinking of the fastest method to resolve the traversed MD tree so omark can make a simplified interface on top of it.

We might try to benchmark two methods:

Please let me know if you like me to try or like to compare yourself 👍🏼

ije commented 5 months ago

i perfer using construct tree, how about md to jsx-likes tree?

# Jobs
Stay _foolish_, stay **hungry**!
[https://apple.com](Apple)
<a href="https://apple.com">Apple</a>
[
  {type: 'h1', children: ['Jobs']},
  {type: 'p', children: [
    'Stay ',
    {type: "em", children: ["foolish"]},
    ', stay ',
    {type: "strong", children: ["hungry"]},
    '!',
    {type: 'a', props: {href: 'https://apple.com'}, children: ['Apple']},
    {type: 'html', props: {html: '<a href="https://apple.com">Apple</a>'}, children: []}
  ]}
]
pi0 commented 5 months ago

Honestly, for omark, I am considering a flattened array of streamable data (to make markdown ASTs as simple as possible) + and some alternative ways of nesting.

If you prefer a nested tree like other parsers there is no problem we can always convert 👍🏼

ije commented 5 months ago

how the flattened array looks like?

ije commented 5 months ago

how about splitting by blocks? this should work as streamable data

--- chunk 1
{type: 'h1', children: ['Jobs']}
--- chunk 2
{type: 'p', children: [
  'Stay ',
  {type: "em", children: ["foolish"]},
  ', stay ',
  {type: "strong", children: ["hungry"]},
  '!',
  {type: 'a', props: {href: 'https://apple.com'}, children: ['Apple']},
  {type: 'html', props: {html: '<a href="https://apple.com">Apple</a>'}, children: []}
]}

or use array instead of object:

--- chunk 1
['h1', ['Jobs']]
--- chunk 2
['p', [
  'Stay ',
  ["em", ["foolish"]],
  ', stay ',
  ["strong", ["hungry"]],
  '!',
  ['a', {href: 'https://apple.com'}, ['Apple']],
  ['html', {html: '<a href="https://apple.com">Apple</a>'}, []]
]]
pi0 commented 5 months ago

Yes, exactly I am thinking about splitting by logical blocks. But tricky to represent (still thinking how). Mainly I am considering using a Proxy that can access each block either as a stringified value or to be traversed individually. (why? because many use cases of tools simply require the high level representation of markdown AST not details) Something like this:

[
  "Jobs", // .{ type: 'h1', contents: <Proxy>[p:stay foolish..a:apple] }
  "Stay foolish, stay hungry!", // .{ type: 'p', contents: <Proxy>[.stay, em: ...] }
  "Apple" // .{ type: 'a', contents: <Proxy>[apple] }
]

I would love to together brainstorm on this possibility once there! I think for first step we need the parsed AST and I have high hopes to rely on md4w is promised before since it is native an minimal! If you are good with first proposal, https://github.com/ije/md4w/issues/3#issuecomment-1946257737 I think we can do it from there.

ije commented 5 months ago

sounds cool! i will try to implement a mdToJson function for a start.

pi0 commented 5 months ago

I just made a quick wrapper that results (almost) same as your proposed object in omark so we can work in parallel.

The object is meant for internal purposes only and I can happily adjust to what you finally provide but also would love to have your 👍🏼 on https://github.com/unjs/omark/pull/15 if you have few minutes to check so we are safe to go.

ije commented 5 months ago

thanks

ije commented 5 months ago

@pi0 https://github.com/ije/md4w/pull/4 the first test has passed(not finished, can't handle the nesting blocks/spans yet)