I want to help - Githubissues

azu commented 6 years ago

Hi, I'm interesting in asciidoc parser and textlint. Because, I am owner of textlint and I've written a book in asciidoctor. But, I not have domain knowledge about Asciidoc/Asciidoctor. Previously, I've tried to create textlint-plugin-asciidoc-loose, but it is failure.

Is there anything that I can help with?

mojavelinux commented 6 years ago

I'd love your help! I plan on making this a fully-compliant AsciiDoc parser that is geared exclusively for validation. After studying validation for AsciiDoc, I've come to realize that it doesn't really make sense to have the same parser for conversion and validation because they have very different goals and needs. Therefore, it makes sense to develop a full parser dedicated for validation. textlint is a perfect fit.

What I need the most is assistance with the mapping of the model. As I began to work on this plugin, I realized I needed a more complete model than what textlint was providing by default (or providing for Markdown). If we could define a more complete model, then I can map the parser to that model directly instead of creating one just for this plugin.

I don't have a complete list of hand, but here are some of the nodes I know I'll need:

TitleNode
SectionNode
DelimitedBlockNode
ParagraphNode
AttributeEntryNode
DocumentNode
HeaderNode
LineNode

There may be others.

Some of these already exist in textlint. But I found that they weren't quite flexible enough to hold the data for AsciiDoc. Perhaps we can resolve that. Let's think about the model that we want. I can handle the parser part as I've already written a parser for AsciiDoc in Asciidoctor.

mojavelinux commented 6 years ago

...and the reason this really matters is that if I don't use the model in textlint, then existing plugins won't work with parsed AsciiDoc documents. I'd really like to be able to tap into the existing plugin ecosystem.

azu commented 6 years ago

Thanks for reply!

it makes sense to develop a full parser dedicated for validation

I agree. textlint's built-in markdown plugin use markdown-to-ast that is subset of remark.

textlint require superset or subsest of Textlint AST. If textlint-plugin-asciidoc return superset of Textlint AST, textlint's rule will just ignore Unknown node.

In my experience, Maybe we should get minimal steps.

Parse any asciidoc/asciidoctor document without error
- For example, markdown plugin has stress test using fixtures.
- For example, current implementation throw error for macro
Add missing nodes
- TitleNode, SectionNode ...

But I found that they weren't quite flexible enough to hold the data for AsciiDoc. Perhaps we can resolve that.

OK

ntgussoni commented 4 years ago

Hello @mojavelinux @azu I wonder if further development on this subject has been made. I'm working really closely to a TW team and have been thinking about linting for asciidoctor.

How can I help?

ntgussoni commented 4 years ago

I somehow always end up coming back to this "issue". @mojavelinux what do you think is missing from the example parser you were working on. Im more than happy to continue working on it, I believe a full-blown AST could lead to great tools.

ggrossetie commented 3 years ago

In my experience, Maybe we should get minimal steps. Parse any asciidoc/asciidoctor document without error For example, markdown plugin has stress test using fixtures. For example, current implementation throw error for macro Add missing nodes TitleNode, SectionNode ...

@azu Sounds good!

textlint require superset or subsest of Textlint AST. If textlint-plugin-asciidoc return superset of Textlint AST, textlint's rule will just ignore Unknown node.

I noticed that Strong, Emphasis and Monospaced (Inline code) don't have a value field. I think it would be necessary if we want to support both Markdown and AsciiDoc.

Let's take a concrete example:

Markdown

# This *Is a* Title

node {
  type: 'Header',
  depth: 1,
  children: [
    {
      type: 'Str',
      value: 'This ',
      loc: [Object],
      range: [Array],
      raw: 'This '
    },
    {
      type: 'Emphasis',
      children: [Array],
      loc: [Object],
      range: [Array],
      raw: '*Is a*'
    },
    {
      type: 'Str',
      value: ' Title',
      loc: [Object],
      range: [Array],
      raw: ' Title'
    }
  ],
  loc: { start: { line: 1, column: 0 }, end: { line: 1, column: 19 } },
  range: [ 0, 19 ],
  raw: '# This *Is a* Title'
}

AsciiDoc

And here's the same document in AsciiDoc

= This _Is a_ Title

node {
  type: 'Header',
  depth: 1,
  children: [
    {
      type: 'Str',
      value: 'This ',
      loc: [Object],
      range: [Array],
      raw: 'This '
    },
    {
      type: 'Emphasis',
      children: [Array],
      loc: [Object],
      range: [Array],
      raw: '_Is a_'
    },
    {
      type: 'Str',
      value: ' Title',
      loc: [Object],
      range: [Array],
      raw: ' Title'
    }
  ],
  loc: { start: { line: 1, column: 0 }, end: { line: 1, column: 19 } },
  range: [ 0, 19 ],
  raw: '= This _Is a_ Title'
}

As you can see, it would be beneficial to have a value field on the Emphasis otherwise the AST is not markup-agnostic (i.e., we cannot extract the text from the markup without knowing the markup language).

@azu Should I open an issue at https://github.com/textlint/textlint?

azu commented 3 years ago

I always use textlint-util-to-string for extracting text content from TxtParentNode. It picks each children nodes's value and joins these. (It aims to pick values that are displayed as rendering result)

it would be beneficial to have a value field on the Emphasis

Basic TxtAST is based on remark. Emphasis node of remark's AST(mdast) has not value property. But, I do not know the reason…

**__1__** may be the reason.

I agree that Universal AST like TxtAST has ambiguous.

ggrossetie commented 3 years ago

Emphasis node of remark's AST(mdast) has not value property.

My bad I didn't see that the Markdown to TxtAST plugin is using a Str child (as described in: https://github.com/syntax-tree/mdast#emphasis). So the value is effectively available on the Str child:

node {
  type: 'Emphasis',
  loc: [Object],
  range: [Array],
  raw: '_Is a_',
  children: [{
    type: 'Str',
    value: 'Is a',
    loc: [Object],
    range: [Array],
    raw: 'Is a'
  }],
}

I will update the AST produced by the AsciiDoc plugin.

mojavelinux commented 1 year ago

The reason this project is stalled is because we don't yet have a clear definition of the formal grammar for AsciiDoc. That is something that the AsciiDoc Language project is working on. Once we have those rules nailed down, we can implement them in a lint project like this one. As I have said elsewhere, I don't think an official parser for AsciiDoc is going to be able to do all the things a linter will need to do (since the parser is focused on parsing a valid AsciiDoc document). However, the two tools will still need to be working off the same playbook, so to speak. That's what the formal grammar part of the specification will provide (and it's no small task).

Building off of work started by Guillaume, I have developed a prototype of an AsciiDoc parser for the formal grammar we are developing as part of the AsciiDoc Language. It's not yet complete, but can handle a good bulk of the syntax already. You can find it here: https://github.com/opendevise/asciidoc-parsing-lab

opendevise / textlint-plugin-asciidoc

I want to help #1

Markdown

AsciiDoc