Open azu opened 6 years ago
I'd love your help! I plan on making this a fully-compliant AsciiDoc parser that is geared exclusively for validation. After studying validation for AsciiDoc, I've come to realize that it doesn't really make sense to have the same parser for conversion and validation because they have very different goals and needs. Therefore, it makes sense to develop a full parser dedicated for validation. textlint is a perfect fit.
What I need the most is assistance with the mapping of the model. As I began to work on this plugin, I realized I needed a more complete model than what textlint was providing by default (or providing for Markdown). If we could define a more complete model, then I can map the parser to that model directly instead of creating one just for this plugin.
I don't have a complete list of hand, but here are some of the nodes I know I'll need:
There may be others.
Some of these already exist in textlint. But I found that they weren't quite flexible enough to hold the data for AsciiDoc. Perhaps we can resolve that. Let's think about the model that we want. I can handle the parser part as I've already written a parser for AsciiDoc in Asciidoctor.
...and the reason this really matters is that if I don't use the model in textlint, then existing plugins won't work with parsed AsciiDoc documents. I'd really like to be able to tap into the existing plugin ecosystem.
Thanks for reply!
it makes sense to develop a full parser dedicated for validation
I agree. textlint's built-in markdown plugin use markdown-to-ast that is subset of remark.
textlint require superset or subsest of Textlint AST. If textlint-plugin-asciidoc return superset of Textlint AST, textlint's rule will just ignore Unknown node.
In my experience, Maybe we should get minimal steps.
But I found that they weren't quite flexible enough to hold the data for AsciiDoc. Perhaps we can resolve that.
OK
Hello @mojavelinux @azu I wonder if further development on this subject has been made. I'm working really closely to a TW team and have been thinking about linting for asciidoctor.
How can I help?
I somehow always end up coming back to this "issue". @mojavelinux what do you think is missing from the example parser you were working on. Im more than happy to continue working on it, I believe a full-blown AST could lead to great tools.
In my experience, Maybe we should get minimal steps. Parse any asciidoc/asciidoctor document without error For example, markdown plugin has stress test using fixtures. For example, current implementation throw error for macro Add missing nodes TitleNode, SectionNode ...
@azu Sounds good!
textlint require superset or subsest of Textlint AST. If textlint-plugin-asciidoc return superset of Textlint AST, textlint's rule will just ignore Unknown node.
I noticed that Strong, Emphasis and Monospaced (Inline code) don't have a value
field. I think it would be necessary if we want to support both Markdown and AsciiDoc.
Let's take a concrete example:
# This *Is a* Title
node {
type: 'Header',
depth: 1,
children: [
{
type: 'Str',
value: 'This ',
loc: [Object],
range: [Array],
raw: 'This '
},
{
type: 'Emphasis',
children: [Array],
loc: [Object],
range: [Array],
raw: '*Is a*'
},
{
type: 'Str',
value: ' Title',
loc: [Object],
range: [Array],
raw: ' Title'
}
],
loc: { start: { line: 1, column: 0 }, end: { line: 1, column: 19 } },
range: [ 0, 19 ],
raw: '# This *Is a* Title'
}
And here's the same document in AsciiDoc
= This _Is a_ Title
node {
type: 'Header',
depth: 1,
children: [
{
type: 'Str',
value: 'This ',
loc: [Object],
range: [Array],
raw: 'This '
},
{
type: 'Emphasis',
children: [Array],
loc: [Object],
range: [Array],
raw: '_Is a_'
},
{
type: 'Str',
value: ' Title',
loc: [Object],
range: [Array],
raw: ' Title'
}
],
loc: { start: { line: 1, column: 0 }, end: { line: 1, column: 19 } },
range: [ 0, 19 ],
raw: '= This _Is a_ Title'
}
As you can see, it would be beneficial to have a value
field on the Emphasis
otherwise the AST is not markup-agnostic (i.e., we cannot extract the text from the markup without knowing the markup language).
@azu Should I open an issue at https://github.com/textlint/textlint?
I always use textlint-util-to-string for extracting text content from TxtParentNode.
It picks each children
nodes's value and joins these.
(It aims to pick values that are displayed as rendering result)
it would be beneficial to have a value field on the Emphasis
Basic TxtAST is based on remark.
Emphasis node of remark's AST(mdast) has not value
property.
But, I do not know the reason…
**__1__**
may be the reason.
I agree that Universal AST like TxtAST has ambiguous.
Emphasis node of remark's AST(mdast) has not value property.
My bad I didn't see that the Markdown to TxtAST plugin is using a Str
child (as described in: https://github.com/syntax-tree/mdast#emphasis).
So the value is effectively available on the Str
child:
node {
type: 'Emphasis',
loc: [Object],
range: [Array],
raw: '_Is a_',
children: [{
type: 'Str',
value: 'Is a',
loc: [Object],
range: [Array],
raw: 'Is a'
}],
}
I will update the AST produced by the AsciiDoc plugin.
The reason this project is stalled is because we don't yet have a clear definition of the formal grammar for AsciiDoc. That is something that the AsciiDoc Language project is working on. Once we have those rules nailed down, we can implement them in a lint project like this one. As I have said elsewhere, I don't think an official parser for AsciiDoc is going to be able to do all the things a linter will need to do (since the parser is focused on parsing a valid AsciiDoc document). However, the two tools will still need to be working off the same playbook, so to speak. That's what the formal grammar part of the specification will provide (and it's no small task).
Building off of work started by Guillaume, I have developed a prototype of an AsciiDoc parser for the formal grammar we are developing as part of the AsciiDoc Language. It's not yet complete, but can handle a good bulk of the syntax already. You can find it here: https://github.com/opendevise/asciidoc-parsing-lab
Hi, I'm interesting in asciidoc parser and textlint. Because, I am owner of textlint and I've written a book in asciidoctor. But, I not have domain knowledge about Asciidoc/Asciidoctor. Previously, I've tried to create textlint-plugin-asciidoc-loose, but it is failure.
Is there anything that I can help with?