Closed SerafimArts closed 5 years ago
Hello :-),
Either the token is skipped, and it will not appear in the token sequence, or the token is kept, and then it appears in the token sequence. Having a skip token in the token sequence is “not possible” (actually, it is possible, but this is not how it works).
Everything happens here: https://github.com/hoaproject/Compiler/blob/c86ccfbce9b9cad17cf84ffdf5c505c695d83d7a/Llk/Lexer.php#L205
and here:
It is still possible to introduce a new pragma to control the presence of the skip tokens inside the token sequence, but this is difficult, because we should modify both the lexer (easy part) and the parser (more difficult: We should add conditions everywhere to jumb over skip tokens, but still place them in the AST, not simple at all).
A %skip
token is just a shortcut for %token skip
. You can use them inside token namespaces.
What's your usecase exactly?
What's your usecase exactly?
I did not want to go into details =( OK
Grammar here: https://github.com/SerafimArts/Railgun/blob/master/resources/grammar.pp (so big)
In real life, I can write files like this:
type User {
id: ID!
login: String!
friends(count: Int = 10): [User]
createdAt(format: String = RFC3339): String
updatedAt(format: String = RFC3339): String
}
The GraphQL standard implies that all comments that are before the declaration should be used as a description:
# This is description of user
type User {
# This is identifier of the user type object
# Every user contain one unique identifier
id: ID!
# This is a name of the user
login: String!
# Will returns user friends
friends(count: Int = 10): [User]
# Created date of user
createdAt(format: String = RFC3339): String
# Last user activity time
updatedAt(format: String = RFC3339): String
}
Each of these comments must fall into the type introspection. But GraphQL admits and such syntax:
# This is description of user
type User # TODO add avatar and email
{
# This is identifier of the user type object
# Every user contain one unique identifier
id: ID! # TODO Change to UUID
# This is a name of the user
login: String!
...etc
In this case, the comments that are not placed where you want (# TODO ...
) are simply ignored. I do not know how to implement this without problems
A %skip token is just a shortcut for %token skip. You can use them inside token namespaces.
Little did not understand this thesis =\
Before I forget, cc @Grummfy (https://github.com/Grummfy/PHPGraphQL/). He has the same idea to integrate hoa/compiler
inside webonyx/graphql-php
. I am not saying one should replace another, just making a connection here :-).
if I remember well I have finally find a way to deal with comments... but unfortunately due to time I can't dig into your code to help you now, so just look at my repo if it can helps ;)
@SerafimArts OK, I understand your problem. Side note: Really interesting project, congrats! The grammar is a piece of art, I love the coding style.
Naive approach:
It sounds like two different tokens:
%skip “newline . blanks* . (something that is not a newline)+ . blanks? #”
, and%token comment “newline . blanks* . #”
It's pseudo-code-ish because I don't know how you manage your newlines.
Do you understand my approach?
@Hywan Spaces have no meaning. For the first time I cited an example of a PHP code for the reason that it is possible to read Dobblock from Reflection (http://php.net/manual/en/reflectionclass.getdoccomment.php). And its behavior for obtaining documentation information are very similar to those that I have to implement.
If briefly description docblock is the comments that are:
1) Are in front of the tokens: type
, interface
, union
, enum
, scalar
, input
, directive
. Like:
# This is docblock type description
type Example {}
# This is docblock interface description
interface Example {}
2) Which are before the description of fields, like:
type Example {
# This is docblock description of field named "first"
first: FieldType
# This is another description of field named "second"
second: Some
}
3) All other comments should be ignore, like:
# + This is a readable description block
# + This is a readable description block
type Example
# x This is ignored comment
{
# + This is a readable description block
field: Some # x This is ignored comment
# x This is ignored comment
} # x This is ignored comment
I perfectly understand your issue. I am trying something.
This example is working:
%token word \w+
%token comment #[^\n]*
%skip ignored \h*(#[^\n]*)?\n?
#root:
line()*
#line:
( <comment> #annotated_line )? <word>
Given the following data:
# c1
foo
bar
# c2
baz
qux # c3
eol
We obtain this:
$ hoa compiler:pp grammar.pp data -s -v dump
# namespace token name token value offset
-------------------------------------------------------------------------------
0 default comment # c1 0
1 default word foo 5
2 default word bar 12
3 default comment # c2 16
4 default word baz 21
5 default word qux 28
6 default word eol 37
7 default EOF 40
> #root
> > #annotated_line
> > > token(comment, # c1)
> > > token(word, foo)
> > #line
> > > token(word, bar)
> > #annotated_line
> > > token(comment, # c2)
> > > token(word, baz)
> > #line
> > > token(word, qux)
> > #line
> > > token(word, eol)
If a comment
token is explicitly specified, then we have an #annotated_line
node in the AST. Else, it fallbacks into the skip token.
There is one caveat though: Comments must start at column 0. I am working on this.
@Hywan In this case, there is an intersection of tokens. I've already tried this.
I updated the circuit, now the tests will pass - you will see an error: https://github.com/SerafimArts/Railgun/commit/c54671653cc51b37d211a6ab33822c51b77ef91f https://travis-ci.org/SerafimArts/Railgun/jobs/267208140
Gotcha!
- %skip ignored \h*(#[^\n]*)?\n?
+ %skip ignored \h*(#[^\n]*)?[\n\h]*
I am trying with this data:
# c1
foo
bar
# c2
# c2b
baz
qux # c3
eol
And this grammar:
%token word \w+
%token comment #[^\n]*
%skip ignored \h*(#[^\n]*)?[\n\h]*
#root:
line()*
#line:
( comment() #annotated_line )? <word>
#comment:
<comment>+
(note the special comment
rule just to group many annotations)
And here is the result:
$ hoa compiler:pp grammar.pp data -s -v dump
# namespace token name token value offset
-------------------------------------------------------------------------------
0 default comment # c1 0
1 default word foo 5
2 default word bar 12
3 default comment # c2 20
4 default comment # c2b 26
5 default word baz 33
6 default word qux 40
7 default word eol 49
8 default EOF 52
> #root
> > #annotated_line
> > > #comment
> > > > token(comment, # c1)
> > > token(word, foo)
> > #line
> > > token(word, bar)
> > #annotated_line
> > > #comment
> > > > token(comment, # c2)
> > > > token(comment, # c2b)
> > > token(word, baz)
> > #line
> > > token(word, qux)
> > #line
> > > token(word, eol)
I get an error in last ur example: Unexpected token "# c2b" (comment) at line 8 and column 1: # c2b
grammar:
%token foo foo\b
%token bar bar\b
%token baz baz\b
%token qux qux\b
%token eol eol\b
%token comment #[^\n]*
%skip ignored \h*(#[^\n]*)?[\n\h]*
#root:
line()*
#line:
(<comment> #annotated_line)? word()
#word:
::foo:: | ::bar:: | ::baz:: | ::qux:: | ::eol::
data:
# c1
foo
bar
# c2
# c2b
baz
qux # c3
eol
- (<comment> #annotated_line)? word()
+ (<comment> #annotated_line)* word()
Oops :D
Such a regex should match any comment that starts on a separate line: ^(?:\h*#[^\n]*)|(?:\n\h*#[^\n]*)+
:
https://regex101.com/r/bppmDa/11/
I have absolutely no idea how to implement this, so that the tests are finally passed. It's easier to abandon the idea of supporting all the functionality of GraphQL =\
Grammar here: https://github.com/railt/parser/blob/3df7977f7ba968aa623284cd331dcacef49b1667/resources/grammar.pp
I think the problem is that it indicated the validity of the comment in the root and it has a higher priority: https://github.com/railt/parser/blob/3df7977f7ba968aa623284cd331dcacef49b1667/resources/grammar.pp#L163
Document:
Definition()*
Definition:
# ...
| UnionDefinition()
| Comment() # <= here
Regex: (?:(?:\A|\n)\h*#[^\n]*)+\n\h*\b
didnt works too: https://regex101.com/r/ox2BCl/1
I would not add Comment
inside Definition
directly. I would add it where you can have actual annotations. Did you try this?
An alternative is to add <comment>
everywhere. But everywhere is not in a lot of places actually: Only before a possible newline.
That's strange because the spec of GraphQL says that # …
is a comment and should be ignored, but at the same time, it says it can serve as annotations, but only on specific structures of the language, right? That's really ambiguous… and so, I suggest to add comment
everywhere where it is possible. That's not really convinient, but the GraphQL grammar is designed this way…
An alternative is to add
everywhere. But everywhere is not in a lot of places actually: Only before a possible newline.
I tried - it's incredibly slows down parsing code like this: https://github.com/railt/parser/blob/3df7977f7ba968aa623284cd331dcacef49b1667/tests/.resources/ast-ab-spec-tests/%2Bfull-comments.graphqls
That's strange because the spec of GraphQL says that # … is a comment and should be ignored, but at the same time, it says it can serve as annotations, but only on specific structures of the language, right? That's really ambiguous… and so, I suggest to add comment everywhere where it is possible. That's not really convinient, but the GraphQL grammar is designed this way…
Like PHP, JS+Flow and other languages ;)
PHP:
/** Test */
class /* asdasd */ Example {} // some
(new \ReflectionClass(Example::class))->getDocComment(); // -> "/** Test */"
JS:
/** @flow */
type A = String | Number; // Ignored comment
let a: Function = (example: A) => {};
And this feature of the GraphQL is explained by the fact that the very idea of such a declaration of descriptions of types and fields was implemented already after the release of the first draft of the standard.
It began with Apollo client (https://github.com/apollographql/GitHunt-API/blob/01f91419f16c0eaf93ca74e64f270df8d9195f1b/api/sql/schema.js#L54), implemened here: https://github.com/graphql/graphql-js. And a proposal appeared after. Discussion here: https://github.com/facebook/graphql/pull/90#issuecomment-257002595 and below.
In fact this possibility is not documented, but it is supported by all popular implementations. Because ¯\_(ツ)_/¯
I tried - it's incredibly slows down parsing code like this: https://github.com/railt/parser/blob/3df7977f7ba968aa623284cd331dcacef49b1667/tests/.resources/ast-ab-spec-tests/%2Bfull-comments.graphqls
It is slow if you use the tokens we have defined. If you use a simple token like %token comment #[^\n]+\n
, it should be fast.
It is slow if you use the tokens we have defined. If you use a simple token like %token comment #[^\n]+\n, it should be fast.
Let's assume. But in any case, such a grammar file will turn out to be unsupported (about 800 code lines currently)
So at the moment the best and simplest option is the usual preprocessing of the source code (for example, converting #
to ##
in those places where such sections are allowed by the regex (?:(?:\A|\n)\h*#[^\n]*)+\n\h*\b
and the definition of the rule %token T_DOCBLOCK ##.*
)
Let's assume. But in any case, such a grammar file will turn out to be unsupported (about 800 code lines currently)
Sorry, but I think I don't understand your point of view. Why most parts of the grammar will be unsupported?
I don't like this situation. It is what happens when the grammar is too weird unfortunately. In PHP, comments are marked as “skip”, except for docblock comment (/** … */
), so there is a special kind of comments. Same for Javascript if I remind correctly.
The only alternative I see is: Either insert <comment>
everywhere manually (I reckon it will not cover a lot of places), or ignore them for now, and add the support later. If I understand the problem correctly, GraphQL might add an annotation syntax in a near close future, right? So don't bother supporting a strange grammar if it changes in few months. From my own experience, implementing a whole grammar in one shot is difficult. You can postpone this task since it is not required to recognize valid GraphQL messages.
A pre-processor is a good idea, despite the performance impact. However, GraphQL payloads are generally of small sizes, so the impact should be minimal. Therefore, I think it is a good trade-off.
How can I help you now? I don't have all the time I would like to dig into your problem deeply unfortunately. I will do my best though if you have more questions!
Sorry, but I think I don't understand your point of view. Why most parts of the grammar will be unsupported?
Large amount of code is always hard to read. And its increase and littering can lead to problems.
If I understand the problem correctly, GraphQL might add an annotation syntax in a near close future, right?
It is already implemented in the JS library, so it is unlikely =\
How can I help you now? I don't have all the time I would like to dig into your problem deeply unfortunately. I will do my best though if you have more questions!
I can suggest a solution to the problem - this is an opportunity to overload the "skip tokens" in the rules (group nodes), but as you said - it's huge problem:
%skip some ...
%token any ...
example_rule:
<some>? <any>
So I think I think it's no more, thank you. =)
An opportunity to overload the "skip tokens" in the rules (group nodes)
Or provide an opportunity to declare a token that can be located anywhere.
@SerafimArts is there anything you like to see changed in the hoa/compiler library? If so could you summarize/formulate this change? Or do you have more questions?
@flip111 What I would like to see violates backward compatibility with existing code.
That's why I made a fork and completely rewrote almost all the code from scratch =)))
@SerafimArts ok let's close this issue then. By the way i'm interested to see your fork. Do you have a link?
You could find this fork here: https://github.com/hoaproject/Compiler/network
But it's not difficult for me to add links: FS Abstraction Layer: https://github.com/railt/io Lexer: https://github.com/railt/lexer Parser: https://github.com/railt/parser Compiler: https://github.com/railt/compiler
@SerafimArts maybe you can mention hoa/compiler in the readme of railt/parser for saying thanks
@flip111 Like this? https://github.com/railt/parser/blob/1.4.x/composer.json#L22-L29
Please create a pull request if you want more or some other format for mentioning Hoa. But it should be borne in mind that there is almost no code from the original.
This is the only place where you can find similarities: 1) Original 2) Fork
In my task I need to know information about some tokens that have been marked as "%skip".
For example, here is the code:
Output:
But Im need something like this:
I can replace the
%skip
with a%token
. But I will have to make it (%token
) optional after every active token in the grammar file.Any ideas how implement this?