relevance / edn-ruby

Ruby implementation of Extensible Data Notation as defined by Rich Hickey
MIT License
119 stars 31 forks source link

Improve parsing performance #12

Closed pangloss closed 10 years ago

pangloss commented 11 years ago

EDN is roughly 1000x slower than JSON at deserializing the following trivial string.

I've used Parselet in a project before, so I know exactly where your speed problems are coming from. Please consider using a faster parser. For instance http://rsec.heroku.com/ looks promising (via http://www.ruby-forum.com/topic/3444183 ).

s = "[{\"x\" {\"id\" \"/model/952\", \"model_name\" \"person\", \"ancestors\" [\"record\" \"asset\"], \"format\" \"edn\"}, \"id\" 952, \"name\" nil, \"model_name\" \"person\", \"rel\" {}, \"description\" nil, \"age\" nil, \"updated_at\" nil, \"created_at\" nil, \"anniversary\" nil, \"job\" nil, \"start_date\" nil, \"username\" nil, \"vacation_start\" nil, \"vacation_end\" nil, \"expenses\" nil, \"rate\" nil, \"display_name\" nil, \"gross_profit_per_month\" nil}]"

Benchmark.realtime { 100.times { EDN::Reader.new(s).read } }
#=> 2.713356

Compare this to JSON performance:

j = EDN::Reader.new(s).read.to_json
Benchmark.realtime { 100.times { JSON.parse(j) }}
#=> 0.00223
>> Benchmark.realtime { 100000.times { JSON.parse(j) }}
#=> 2.51108
cndreisbach commented 11 years ago

This is definitely a known issue: Parslet is slow. However, it is extremely easy to prototype in. The best solution would be to rewrite the parser in C, but doing that takes some time. I think this problem is one to keep in mind for the future, but is really seeking the right developer excited about taking it on.

kschiess commented 11 years ago

Using rsec instead of parslet will only gain you a factor of about 2-4, not 1000 as pangloss suggests. The way to go here would really be a hand-written parser. parslets performance has been improving over the past two years. A lot of performance comparisons out there are not done with the newer versions, but with really ancient ones.

pangloss commented 11 years ago

To clarify, the 1000x number is just the comparison to JSON, which I think is relevant since it's the obvious alternative to EDN.

I had gotten the impression rsec might be faster than that, so that's disappointing news. By the way, are you talking about plain rsec or rsec-ext? Do you have any benchmarks that you could link?

As a side note, I should have pointed out that generating EDN with this gem is fast; comparable in speed to JSON.

kschiess commented 11 years ago

You can run https://github.com/kschiess/parslet-benchmarks bin/compare - it shows rsec (Ruby version) to have 16x the performance of parslet and roughly 4x that of treetop. Note that rsec does half the work, so that explains a lot. Further discussion of relative speeds in private? ;) (I don't think it helps the trouble ticket)

kschiess commented 11 years ago

/me still thinks that the title of this trouble ticket is extremely poorly chosen.

cldwalker commented 11 years ago

@kschiess Agreed. Ticket updated @pangloss A pull request that improves parsing performance while keeping existing readability and test suite green would be welcome.

pangloss commented 11 years ago

@cldwalker I agree, but will have to leave the task to someone else.

russolsen commented 10 years ago

We switched to a hand coded parser in 1.0.5. Performance is dramatically better.