pboyer / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
Other
26 stars 1 forks source link

Performance tips #92

Open ashishnegi opened 7 years ago

ashishnegi commented 7 years ago

First of all, thanks for your contribution to antlr4 go. I have read on antlr4 golang target issues that you are using it in non trivial production work. We are also considering it for parsing our graph database language spec. :) Our language spec is a variant of GraphQL.

We started benchmarking from the simplest subset grammar. Benchmarks :

dgraph/antlr4go/graphqlpm$ gotb -test.run=XXX -benchtime=2s -v
BenchmarkQuery/q1-4                 5000       1221967 ns/op
BenchmarkQuery/q2-4                 2000       1710850 ns/op
BenchmarkQuery/q3-4                 3000       1230518 ns/op
BenchmarkQuery/q4-4                 3000       1568564 ns/op
BenchmarkQuery/q5-4                 2000       1803392 ns/op
PASS
ok      github.com/dgraph-io/dgraph/antlr4go/graphqlpm    22.179s

We expected these numbers to be under 0.05 ms. They are currently around 1.5 ms.

Here are comparisons with handwritten parser and antlr golang parser over practical queries : Benchmarks :

query$ gotb -test.run=XXX -benchtime=2s -v
BenchmarkQueryParse/spielberg:handwitten:-4               100000         46918 ns/op
BenchmarkQueryParse/spielberg:antlr:-4                      2000       1741364 ns/op
BenchmarkQueryParse/tomhanks:handwitten:-4                100000         25982 ns/op
BenchmarkQueryParse/tomhanks:antlr:-4                       2000       1654579 ns/op
BenchmarkQueryParse/nestedquery:handwritten:-4             30000         73053 ns/op
BenchmarkQueryParse/nestedquery:antlr:-4                    1000       3385005 ns/op
PASS
ok      github.com/dgraph-io/dgraph/query    21.922s

Antlr4 is around 40x slower. Is this expected ? Or are we doing something wrong ? Can you give some performance tips ? How can be benchmark lexer and parser steps separately ?

willfaught commented 7 years ago

My understanding is that there's a way to seed the parser with prior parser state that "warms" it up and speeds up the adaptation. Are you doing that before measuring? If not, that doesn't seem like a fair comparison against hand-written code. I've never done that before, however, so can't help you there, unfortunately.

ashishnegi commented 7 years ago

Just to be clear, hand written code is not optimized for a particular query. It is a parser for our language spec. We are benchmarking with the code that we expect we will need to run for any user specified query string. If this is not what you meant, i am interested in understanding more about seeding the parser a priori for any random query.

I am currently benchmarking with below code. Am i doing something that can be moved out of the for loop. ?

     for i := 0; i < b.N; i++ {
                // q is the Query. b.N is a million
        input := antlr.NewInputStream(q)
        lexer := parser.NewGraphQLPMLexer(input)
        stream := antlr.NewCommonTokenStream(lexer, 0)
        p := parser.NewGraphQLPMParser(stream)
        p.AddErrorListener(antlr.NewDiagnosticErrorListener(true))
        p.BuildParseTrees = true
        // uptill here we have a cost of : 15000 for q1
        // next call makes it 100 times more costly to : 1800000
                _ = p.Document()
    }

I have tried p.BuildParseTrees = false and attaching a dummy listener, but it seems to have same performance results.

willfaught commented 7 years ago

Check out https://github.com/antlr/antlr4/blob/master/doc/faq/general.md, it discusses some aspects of perf, although no mention of seeding. Sorry, I can't recall where I saw that mentioned.

It seems from that document that your example is missing the Go equivalent of this Java:

parser.getInterpreter().setPredictionMode(PredictionMode.SLL);

Does that help at all?

On Tue, Dec 20, 2016 at 10:24 PM, Ashish Negi notifications@github.com wrote:

Just to be clear, hand written code is not optimized for a particular query. It is a parser for our language spec. We are benchmarking with the code that we expect we will need to run for any user specified query string. If this is not what you meant, i am interested in understanding more about seeding the parser a priori for any random query.

I am currently benchmarking with below code. Am i doing something that can be moved out of the for loop. ?

 for i := 0; i < b.N; i++ {
            // q is the Query. b.N is a million
  input := antlr.NewInputStream(q)
  lexer := parser.NewGraphQLPMLexer(input)
  stream := antlr.NewCommonTokenStream(lexer, 0)
  p := parser.NewGraphQLPMParser(stream)
  p.AddErrorListener(antlr.NewDiagnosticErrorListener(true))
  p.BuildParseTrees = true
  // uptill here we have a cost of : 15000 for q1
  // next call makes it 100 times more costly to : 1800000
            _ = p.Document()
}

I have tried p.BuildParseTrees = false and attaching a dummy listener, but it seems to have same performance results.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pboyer/antlr4/issues/92#issuecomment-268446516, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD5VnLpQpD2rQtx07UD4cq6eVn9f3T1ks5rKMYVgaJpZM4LSjnt .

willfaught commented 7 years ago

Also note that that document mentions you should be warming up the parser to get the best perf. I assume they mean by hand, just have it parse a corpus beforehand. That's one way to "seed" the parser. ;) Does that make a difference?

On Tue, Dec 20, 2016 at 11:03 PM, Will Faught will.faught@gmail.com wrote:

Check out https://github.com/antlr/antlr4/blob/master/doc/faq/general.md, it discusses some aspects of perf, although no mention of seeding. Sorry, I can't recall where I saw that mentioned.

It seems from that document that your example is missing the Go equivalent of this Java:

parser.getInterpreter().setPredictionMode(PredictionMode.SLL);

Does that help at all?

On Tue, Dec 20, 2016 at 10:24 PM, Ashish Negi notifications@github.com wrote:

Just to be clear, hand written code is not optimized for a particular query. It is a parser for our language spec. We are benchmarking with the code that we expect we will need to run for any user specified query string. If this is not what you meant, i am interested in understanding more about seeding the parser a priori for any random query.

I am currently benchmarking with below code. Am i doing something that can be moved out of the for loop. ?

 for i := 0; i < b.N; i++ {
            // q is the Query. b.N is a million
 input := antlr.NewInputStream(q)
 lexer := parser.NewGraphQLPMLexer(input)
 stream := antlr.NewCommonTokenStream(lexer, 0)
 p := parser.NewGraphQLPMParser(stream)
 p.AddErrorListener(antlr.NewDiagnosticErrorListener(true))
 p.BuildParseTrees = true
 // uptill here we have a cost of : 15000 for q1
 // next call makes it 100 times more costly to : 1800000
            _ = p.Document()
}

I have tried p.BuildParseTrees = false and attaching a dummy listener, but it seems to have same performance results.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pboyer/antlr4/issues/92#issuecomment-268446516, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD5VnLpQpD2rQtx07UD4cq6eVn9f3T1ks5rKMYVgaJpZM4LSjnt .

willfaught commented 7 years ago

Check out ATNDeserializer/ATNSerializer. Not sure if that's it exactly, but they look interesting. I see them in the Java runtime doc, at least, and I believe there's at least an ATNDeserializer in the Go runtime.

On Tue, Dec 20, 2016 at 11:04 PM, Will Faught will.faught@gmail.com wrote:

Also note that that document mentions you should be warming up the parser to get the best perf. I assume they mean by hand, just have it parse a corpus beforehand. That's one way to "seed" the parser. ;) Does that make a difference?

On Tue, Dec 20, 2016 at 11:03 PM, Will Faught will.faught@gmail.com wrote:

Check out https://github.com/antlr/antlr4/blob/master/doc/faq/general.md, it discusses some aspects of perf, although no mention of seeding. Sorry, I can't recall where I saw that mentioned.

It seems from that document that your example is missing the Go equivalent of this Java:

parser.getInterpreter().setPredictionMode(PredictionMode.SLL);

Does that help at all?

On Tue, Dec 20, 2016 at 10:24 PM, Ashish Negi notifications@github.com wrote:

Just to be clear, hand written code is not optimized for a particular query. It is a parser for our language spec. We are benchmarking with the code that we expect we will need to run for any user specified query string. If this is not what you meant, i am interested in understanding more about seeding the parser a priori for any random query.

I am currently benchmarking with below code. Am i doing something that can be moved out of the for loop. ?

 for i := 0; i < b.N; i++ {
            // q is the Query. b.N is a million
    input := antlr.NewInputStream(q)
    lexer := parser.NewGraphQLPMLexer(input)
    stream := antlr.NewCommonTokenStream(lexer, 0)
    p := parser.NewGraphQLPMParser(stream)
    p.AddErrorListener(antlr.NewDiagnosticErrorListener(true))
    p.BuildParseTrees = true
    // uptill here we have a cost of : 15000 for q1
    // next call makes it 100 times more costly to : 1800000
            _ = p.Document()
}

I have tried p.BuildParseTrees = false and attaching a dummy listener, but it seems to have same performance results.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pboyer/antlr4/issues/92#issuecomment-268446516, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD5VnLpQpD2rQtx07UD4cq6eVn9f3T1ks5rKMYVgaJpZM4LSjnt .

ashishnegi commented 7 years ago

Thanks. I had read that document, but missed the SLL part.

Q. For seeding and warmup to work, will all query parsing have to go through same parser object ? Where does antlr4 store all that information ?

Working on your suggestions : my benchmark function now looks like :

func runAntlrParser(q string, b *testing.B) {
       // some seeding with queries.
    aqs := []string{aq1, aq2, aq3, aq4, aq5, aq6, aq7, aq8, aq9, aq10, aq11}
    for _, aq := range aqs {
        antlrQueryParse(aq)
    }

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        input := antlr.NewInputStream(q)
        lexer := parser.NewGraphQLPMLexer(input)
        stream := antlr.NewCommonTokenStream(lexer, 0)
        p := parser.NewGraphQLPMParser(stream)
        p.AddErrorListener(antlr.NewDiagnosticErrorListener(true))
        p.BuildParseTrees = false
        p.AddParseListener(newDummyListerner())
                // try out sll mode first.
        p.GetInterpreter().SetPredictionMode(antlr.PredictionModeSLL)
        _ = p.Document()
    }
}

Also, I think Reset functions on CommonTokenStream and Parser are not exported.

Luckily for me, p.Document did not panicked. I put a log statement in recover of that. I ran benchmarks with -v option.

If everything that i tried is ok, i still do not see much improvement.

/query$ gotb -test.run=XXX -v  -benchtime=2s
BenchmarkQueryParse/spielberg:handwitten:-4                50000         49344 ns/op
BenchmarkQueryParse/spielberg:antlr:-4                      2000       1450628 ns/op
BenchmarkQueryParse/tomhanks:handwitten:-4                200000         28737 ns/op
BenchmarkQueryParse/tomhanks:antlr:-4                       2000       1674292 ns/op
BenchmarkQueryParse/nestedquery:handwritten:-4             50000         70716 ns/op
BenchmarkQueryParse/nestedquery:antlr:-4                    1000       3009771 ns/op
PASS
ok      github.com/dgraph-io/dgraph/query   23.522s
ashishnegi@ashish:~/work/golang/src/github.com/dgraph-io/dgraph/query$ gotb -test.run=XXX -v  -benchtime=2s
BenchmarkQueryParse/spielberg:handwitten:-4               100000         35152 ns/op
BenchmarkQueryParse/spielberg:antlr:-4                      2000       1326918 ns/op
BenchmarkQueryParse/tomhanks:handwitten:-4                200000         25908 ns/op
BenchmarkQueryParse/tomhanks:antlr:-4                       2000       1409008 ns/op
BenchmarkQueryParse/nestedquery:handwritten:-4             30000         87302 ns/op
BenchmarkQueryParse/nestedquery:antlr:-4                    2000       3406949 ns/op
PASS
ok      github.com/dgraph-io/dgraph/query   25.542s
ashishnegi commented 7 years ago

I would look into ATNDeserializer. Thanks for your suggestions. We will definitely use ANTLR if we can bring down numbers to 2-3x times that of current implementation.

willfaught commented 7 years ago

It doesn't hurt to ask in antlr/antlr4 as well. Sometimes grammars can be inefficient as well. @parrt or @sharwell might be able to help, and would definitely know whether ATNSerializer/Deserializer are what you're looking for.

On Wed, Dec 21, 2016 at 12:22 AM, Ashish Negi notifications@github.com wrote:

I would look into ATNDeserializer. Thanks for your suggestions. We would definitely use ANTLR if we can bring down numbers to 2-3x times that of current implementation.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pboyer/antlr4/issues/92#issuecomment-268463774, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD5VkCq_OAdq_9W_yfhlpXlIVW6gUoDks5rKOHVgaJpZM4LSjnt .

ashishnegi commented 7 years ago

I asked this in antlr/antlr4 page. We found that lexing is taking most of the time : However, i am not sure to continue discussion there as it seems that antlr4 github page is not right place for performance problems.

with only lexing :

func runAntlrParser(q string, b *testing.B) {
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        input := antlr.NewInputStream(q)
        lexer := parser.NewGraphQLPMLexer(input)
        // for only lexer benchmark
        t := lexer.NextToken()
        for t.GetTokenType() != antlr.TokenEOF {
            t = lexer.NextToken()
        }
    }
}

i found that lexing is taking most of the time :

benchmark with only lexer for antlr :

/query$ gotb -test.run=XXX -v  -benchtime=5s
BenchmarkQueryParse/spielberg:handwitten:-4               200000         45724 ns/op
BenchmarkQueryParse/spielberg:antlr:-4                      5000       1468218 ns/op
BenchmarkQueryParse/tomhanks:handwitten:-4                500000         28649 ns/op
BenchmarkQueryParse/tomhanks:antlr:-4                       5000       1538988 ns/op
BenchmarkQueryParse/nestedquery:handwritten:-4            100000         80210 ns/op
BenchmarkQueryParse/nestedquery:antlr:-4                    5000       3029668 ns/op
PASS
ok      github.com/dgraph-io/dgraph/query   63.546s

with both lexer and parser :

ashishnegi@ashish:~/work/golang/src/github.com/dgraph-io/dgraph/query$ gotb -test.run=XXX -v  -benchtime=5s
BenchmarkQueryParse/spielberg:handwitten:-4               300000         47772 ns/op
BenchmarkQueryParse/spielberg:antlr:-4                      3000       1868297 ns/op
BenchmarkQueryParse/tomhanks:handwitten:-4                500000         27980 ns/op
BenchmarkQueryParse/tomhanks:antlr:-4                       5000       1616518 ns/op
BenchmarkQueryParse/nestedquery:handwritten:-4            100000         74961 ns/op
BenchmarkQueryParse/nestedquery:antlr:-4                    2000       3312977 ns/op
PASS
ok      github.com/dgraph-io/dgraph/query   58.056s

I think that this is well known issue that lexing takes most of the time. Where can i read more about bringing down the lexing time ?

willfaught commented 7 years ago

No idea. I think you know more than I do at this point. Can you link to your issue on antlr/antlr4?

On Wed, Dec 21, 2016 at 3:03 AM Ashish Negi notifications@github.com wrote:

I asked this in antlr/antlr4 page. We found that lexing is taking most of the time :

However, i am not sure to continue discussion there as it seems that antlr4 github page is not right place for performance problems.

with only lexing :

func runAntlrParser(q string, b *testing.B) {

b.ResetTimer()

for i := 0; i < b.N; i++ {

    input := antlr.NewInputStream(q)

    lexer := parser.NewGraphQLPMLexer(input)

    // for only lexer benchmark

    t := lexer.NextToken()

    for t.GetTokenType() != antlr.TokenEOF {

        t = lexer.NextToken()

    }

}

}

i found that lexing is taking most of the time :

benchmark with only lexer for antlr :

/query$ gotb -test.run=XXX -v -benchtime=5s

BenchmarkQueryParse/spielberg:handwitten:-4 200000 45724 ns/op

BenchmarkQueryParse/spielberg:antlr:-4 5000 1468218 ns/op

BenchmarkQueryParse/tomhanks:handwitten:-4 500000 28649 ns/op

BenchmarkQueryParse/tomhanks:antlr:-4 5000 1538988 ns/op

BenchmarkQueryParse/nestedquery:handwritten:-4 100000 80210 ns/op

BenchmarkQueryParse/nestedquery:antlr:-4 5000 3029668 ns/op

PASS

ok github.com/dgraph-io/dgraph/query 63.546s

ashishnegi@ashish:~/work/golang/src/github.com/dgraph-io/dgraph/query$ gotb -test.run=XXX -v -benchtime=5s

BenchmarkQueryParse/spielberg:handwitten:-4 300000 47772 ns/op

BenchmarkQueryParse/spielberg:antlr:-4 3000 1868297 ns/op

BenchmarkQueryParse/tomhanks:handwitten:-4 500000 27980 ns/op

BenchmarkQueryParse/tomhanks:antlr:-4 5000 1616518 ns/op

BenchmarkQueryParse/nestedquery:handwritten:-4 100000 74961 ns/op

BenchmarkQueryParse/nestedquery:antlr:-4 2000 3312977 ns/op

PASS

ok github.com/dgraph-io/dgraph/query 58.056s

I think that this is well known issue that lexing takes most of the time.

Where can i read more about bringing down the lexing time ?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pboyer/antlr4/issues/92#issuecomment-268495731, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD5VnMDZ4bkAEuyHWVopMTIYkzrIXu2ks5rKQdkgaJpZM4LSjnt .

ashishnegi commented 7 years ago

This is the issue : antlr/antlr4 issue

willfaught commented 7 years ago

It very well could be a perf issue with the Go antlr runtime. Have you tried benchmarking with another language target like Java or Python? If it is indeed just a Go runtime issue, it would be immensely helpful if you could profile your program to pinpoint the hotspot(s).

ashishnegi commented 7 years ago

I have not yet benchmarked other targets, but here is perf top10 results for 100 iteration of only lexing of a query :

ashishnegi@ashish:~/work/golang/src/github.com/dgraph-io/dgraph/query$ go tool  pprof query.test cpu4.out
Entering interactive mode (type "help" for commands)
(pprof) top --cum
10ms of 30ms total (33.33%)
Showing top 10 nodes out of 30 (cum >= 30ms)
      flat  flat%   sum%        cum   cum%
         0     0%     0%       30ms   100%  github.com/antlr/antlr4/runtime/Go/antlr.(*BaseLexer).GetAllTokens
         0     0%     0%       30ms   100%  github.com/antlr/antlr4/runtime/Go/antlr.(*BaseLexer).NextToken
         0     0%     0%       30ms   100%  github.com/antlr/antlr4/runtime/Go/antlr.(*BaseLexer).safeMatch
         0     0%     0%       30ms   100%  github.com/antlr/antlr4/runtime/Go/antlr.(*LexerATNSimulator).Match
         0     0%     0%       30ms   100%  github.com/antlr/antlr4/runtime/Go/antlr.(*LexerATNSimulator).computeTargetState
         0     0%     0%       30ms   100%  github.com/antlr/antlr4/runtime/Go/antlr.(*LexerATNSimulator).execATN
      10ms 33.33% 33.33%       30ms   100%  github.com/antlr/antlr4/runtime/Go/antlr.(*LexerATNSimulator).getReachableConfigSet
         0     0% 33.33%       30ms   100%  github.com/dgraph-io/dgraph/query.TestProfileQueryParse
         0     0% 33.33%       30ms   100%  runtime.goexit
         0     0% 33.33%       30ms   100%  testing.tRunner
(pprof) top10
30ms of 30ms total (  100%)
Showing top 10 nodes out of 30 (cum >= 30ms)
      flat  flat%   sum%        cum   cum%
      10ms 33.33% 33.33%       10ms 33.33%  fmt.(*pp).doPrintf
      10ms 33.33% 66.67%       30ms   100%  github.com/antlr/antlr4/runtime/Go/antlr.(*LexerATNSimulator).getReachableConfigSet
      10ms 33.33%   100%       10ms 33.33%  runtime.memclr
         0     0%   100%       10ms 33.33%  fmt.Sprintf
         0     0%   100%       10ms 33.33%  github.com/antlr/antlr4/runtime/Go/antlr.(*BaseATNConfigSet).Add
         0     0%   100%       30ms   100%  github.com/antlr/antlr4/runtime/Go/antlr.(*BaseLexer).GetAllTokens
         0     0%   100%       30ms   100%  github.com/antlr/antlr4/runtime/Go/antlr.(*BaseLexer).NextToken
         0     0%   100%       30ms   100%  github.com/antlr/antlr4/runtime/Go/antlr.(*BaseLexer).safeMatch
         0     0%   100%       10ms 33.33%  github.com/antlr/antlr4/runtime/Go/antlr.(*LexerATNConfig).Hash
         0     0%   100%       30ms   100%  github.com/antlr/antlr4/runtime/Go/antlr.(*LexerATNSimulator).Match
(pprof) web
(pprof) exit
ashishnegi commented 7 years ago

After benchmarking with cpp target :

Only lexing on nestedquery query for 1 million times => average time is ~42 microseconds. Same query in golang target takes on average ~3.3 milliseconds for lexing ; Around 800 times slower.

/graphqlpmcpp$ time ./graphqlexe 

real    0m42.380s
user    0m42.368s
sys 0m0.000s

and with lexing and parsing : ~107 microseconds. [results are for 1 million iterations.]

/graphqlpmcpp$ time ./graphqlexe 

real    1m46.194s
user    1m46.172s
sys 0m0.004s

So, we can fix this issue in Golang target itself. This is not a bad news.

Here is the output of token stream and tree on cpp version just to be sure that it works properly :

/graphqlpmcpp$ ./graphqlexe 
[@0,0:0='{',<10>,1:0]
[@1,1:5='debug',<14>,1:1]
[@2,6:6='(',<7>,1:6]
[@3,7:11='_xid_',<14>,1:7]
[@4,12:12=':',<12>,1:12]
[@5,14:22='"m.06pj8"',<13>,1:14]
[@6,23:23=')',<2>,1:23]
[@7,25:25='{',<10>,1:25]
[@8,26:62='type.object.name.enfilm.director.film',<14>,1:26]
[@9,64:64='(',<7>,1:64]
[@10,65:69='first',<14>,1:65]
[@11,70:70=':',<12>,1:70]
[@12,72:74='"2"',<13>,1:72]
[@13,75:75=',',<8>,1:75]
[@14,77:82='offset',<14>,1:77]
[@15,83:83=':',<12>,1:83]
[@16,84:87='"10"',<13>,1:84]
[@17,88:88=')',<2>,1:88]
[@18,90:97='@filter(',<1>,1:90]
[@19,98:102='anyof',<5>,1:98]
[@20,103:103='(',<7>,1:103]
[@21,104:124='"type.object.name.en"',<13>,1:104]
[@22,126:126=',',<8>,1:126]
[@23,128:138='"war spies"',<13>,1:128]
[@24,139:139=')',<2>,1:139]
[@25,141:142='&&',<4>,1:141]
[@26,144:148='allof',<6>,1:144]
[@27,149:149='(',<7>,1:149]
[@28,150:170='"type.object.name.en"',<13>,1:150]
[@29,171:171=',',<8>,1:171]
[@30,173:185='"hello world"',<13>,1:173]
[@31,186:186=')',<2>,1:186]
[@32,187:188='||',<3>,1:187]
[@33,190:194='allof',<6>,1:190]
[@34,195:195='(',<7>,1:195]
[@35,196:216='"type.object.name.en"',<13>,1:196]
[@36,217:217=',',<8>,1:217]
[@37,219:231='"wonder land"',<13>,1:219]
[@38,232:232=')',<2>,1:232]
[@39,233:233=')',<2>,1:233]
[@40,236:236='{',<10>,1:236]
[@41,237:325='_uid_type.object.name.enfilm.film.initial_release_datefilm.film.countryfilm.film.starring',<14>,1:237]
[@42,327:327='{',<10>,1:327]
[@43,328:349='film.performance.actor',<14>,1:328]
[@44,351:351='{',<10>,1:351]
[@45,352:370='type.object.name.en',<14>,1:352]
[@46,371:371='}',<11>,1:371]
[@47,372:397='film.performance.character',<14>,1:372]
[@48,399:399='{',<10>,1:399]
[@49,400:418='type.object.name.en',<14>,1:400]
[@50,419:419='}',<11>,1:419]
[@51,420:420='}',<11>,1:420]
[@52,421:435='film.film.genre',<14>,1:421]
[@53,437:437='{',<10>,1:437]
[@54,438:456='type.object.name.en',<14>,1:438]
[@55,457:457='}',<11>,1:457]
[@56,458:458='}',<11>,1:458]
[@57,459:459='}',<11>,1:459]
[@58,460:460='}',<11>,1:460]
[@59,461:460='<EOF>',<-1>,1:461]
(document (definition (selectionSet { (field debug (arguments ( (argument _xid_ : (value "m.06pj8")) )) (selectionSet { (field type.object.name.enfilm.director.film (arguments ( (argument first : (value "2")) , (argument offset : (value "10")) )) (filters @filter( (pair (funcName anyof) ( (fieldName "type.object.name.en") , (value "war spies") )) (filterOperation &&) (pair (funcName allof) ( (fieldName "type.object.name.en") , (value "hello world") )) (filterOperation ||) (pair (funcName allof) ( (fieldName "type.object.name.en") , (value "wonder land") )) )) (selectionSet { (field _uid_type.object.name.enfilm.film.initial_release_datefilm.film.countryfilm.film.starring (selectionSet { (field film.performance.actor (selectionSet { (field type.object.name.en) })) (field film.performance.character (selectionSet { (field type.object.name.en) })) })) (field film.film.genre (selectionSet { (field type.object.name.en) })) })) })) })))
willfaught commented 7 years ago

Interesting. So there probably are perf gains to be had in the Go lexer.

I don't have the time right now to investigate, but hopefully someone does!

On Thu, Dec 22, 2016 at 3:58 AM Ashish Negi notifications@github.com wrote:

After benchmarking https://github.com/ashishnegi/dgraph/blob/bench-antlr4/antlr4go/graphqlpmcpp/GraphQLPM.cpp with cpp target :

Only lexing on nestedquery query for 1 million times => average time is ~42 microseconds.

Same query in golang target takes on average ~3.3 milliseconds for lexing ; Around 800 times slower.

/graphqlpmcpp$ time ./graphqlexe

real 0m42.380s

user 0m42.368s

sys 0m0.000s

and with lexing and parsing : ~107 microseconds. [results are for 1 million iterations.]

/graphqlpmcpp$ time ./graphqlexe

real 1m46.194s

user 1m46.172s

sys 0m0.004s

So, we can fix this issue in Golang target itself. This is not a bad news.

Here is the output of token stream and tree on cpp version just to be sure that it works properly :

/graphqlpmcpp$ ./graphqlexe

[@0,0:0='{',<10>,1:0]

[@1,1:5='debug',<14>,1:1]

[@2,6:6='(',<7>,1:6]

[@3,7:11='xid',<14>,1:7]

[@4,12:12=':',<12>,1:12]

[@5,14:22='"m.06pj8"',<13>,1:14]

[@6,23:23=')',<2>,1:23]

[@7,25:25='{',<10>,1:25]

[@8,26:62='type.object.name.enfilm.director.film',<14>,1:26]

[@9,64:64='(',<7>,1:64]

[@10,65:69='first',<14>,1:65]

[@11,70:70=':',<12>,1:70]

[@12,72:74='"2"',<13>,1:72]

[@13,75:75=',',<8>,1:75]

[@14,77:82='offset',<14>,1:77]

[@15,83:83=':',<12>,1:83]

[@16,84:87='"10"',<13>,1:84]

[@17,88:88=')',<2>,1:88]

[@18,90:97='@filter(',<1>,1:90]

[@19,98:102='anyof',<5>,1:98]

[@20,103:103='(',<7>,1:103]

[@21,104:124='"type.object.name.en"',<13>,1:104]

[@22,126:126=',',<8>,1:126]

[@23,128:138='"war spies"',<13>,1:128]

[@24,139:139=')',<2>,1:139]

[@25,141:142='&&',<4>,1:141]

[@26,144:148='allof',<6>,1:144]

[@27,149:149='(',<7>,1:149]

[@28,150:170='"type.object.name.en"',<13>,1:150]

[@29,171:171=',',<8>,1:171]

[@30,173:185='"hello world"',<13>,1:173]

[@31,186:186=')',<2>,1:186]

[@32,187:188='||',<3>,1:187]

[@33,190:194='allof',<6>,1:190]

[@34,195:195='(',<7>,1:195]

[@35,196:216='"type.object.name.en"',<13>,1:196]

[@36,217:217=',',<8>,1:217]

[@37,219:231='"wonder land"',<13>,1:219]

[@38,232:232=')',<2>,1:232]

[@39,233:233=')',<2>,1:233]

[@40,236:236='{',<10>,1:236]

[@41,237:325='_uid_type.object.name.enfilm.film.initial_release_datefilm.film.countryfilm.film.starring',<14>,1:237]

[@42,327:327='{',<10>,1:327]

[@43,328:349='film.performance.actor',<14>,1:328]

[@44,351:351='{',<10>,1:351]

[@45,352:370='type.object.name.en',<14>,1:352]

[@46,371:371='}',<11>,1:371]

[@47,372:397='film.performance.character',<14>,1:372]

[@48,399:399='{',<10>,1:399]

[@49,400:418='type.object.name.en',<14>,1:400]

[@50,419:419='}',<11>,1:419]

[@51,420:420='}',<11>,1:420]

[@52,421:435='film.film.genre',<14>,1:421]

[@53,437:437='{',<10>,1:437]

[@54,438:456='type.object.name.en',<14>,1:438]

[@55,457:457='}',<11>,1:457]

[@56,458:458='}',<11>,1:458]

[@57,459:459='}',<11>,1:459]

[@58,460:460='}',<11>,1:460]

[@59,461:460='',<-1>,1:461]

(document (definition (selectionSet { (field debug (arguments ( (argument xid : (value "m.06pj8")) )) (selectionSet { (field type.object.name.enfilm.director.film (arguments ( (argument first : (value "2")) , (argument offset : (value "10")) )) (filters @filter( (pair (funcName anyof) ( (fieldName "type.object.name.en") , (value "war spies") )) (filterOperation &&) (pair (funcName allof) ( (fieldName "type.object.name.en") , (value "hello world") )) (filterOperation ||) (pair (funcName allof) ( (fieldName "type.object.name.en") , (value "wonder land") )) )) (selectionSet { (field _uid_type.object.name.enfilm.film.initial_release_datefilm.film.countryfilm.film.starring (selectionSet { (field film.performance.actor (selectionSet { (field type.object.name.en) })) (field film.performance.character (selectionSet { (field type.object.name.en) })) })) (field film.film.genre (selectionSet { (field type.object.name.en) })) })) })) })))

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pboyer/antlr4/issues/92#issuecomment-268785450, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD5VkHzYSsHy5FYZ21aZJDDE2nnycUfks5rKmXpgaJpZM4LSjnt .