ruby-rdf / rdf-turtle

Turtle reader/writer for Ruby
http://rubygems.org/gems/rdf-turtle
The Unlicense
31 stars 9 forks source link

Parser is slow #12

Closed jcoyne closed 9 years ago

jcoyne commented 9 years ago
puts Benchmark.measure { RDF::Graph.load('out.ttl', format: :ttl) }
  1.020000   0.010000   1.030000 (  1.032824)

where out.ttl is: https://gist.github.com/jcoyne/5d153463e1379ac2324b

gkellogg commented 9 years ago

It's certainly slower thank N-Triples, as the LL(1) nature of the oaraer introduces much more overhead. A hand-built parser would certainly be built faster, but the consensus was that sticking to LL(1) was worth it.

If you have some thoughts on specific speed-ups, let me know. A custom parser isn't out of the question, but would be a fair amount of work.bote that there is a Freebase-style reader which is quite fast, but uses a constrained syntax.

gkellogg commented 9 years ago

FYI, I did a PEG parser (using Treetop) for N3 some time ago. It performs reasonably well, although the details of working with that parser aren't ideal IIRC. The main problem is that it works by constructing a parse tree, and then providing the parse tree after parsing is complete, which can be navigated to generate triples. This only works for relatively bound input files. The LL(1) parser will handle input of arbitrary size, albeit at a slower parse rate.

If someone were interested in creating a more optimal parser, it would be reasonable to use it as the default, and provide an option for running the LL(1) parser if necessary.

gkellogg commented 9 years ago

So, this is on my radar now.

gkellogg commented 9 years ago

The updated parser in the new-parser branch is about 3x faster running examples/sp2b.ttl ~50K triples. Let me know what you think. I'm not sure how to make it much faster, without using a limited syntax, similar to the Freebase parser.

Before it can be merged to develop, the TriG parser will need to be updated too.

gkellogg commented 9 years ago

Released in Release 1.1.7.

jcoyne commented 9 years ago

Huge improvement! :clap: :rocket: