ohler55 / oj

Optimized JSON
http://www.ohler.com/oj
MIT License
3.16k stars 252 forks source link

Benchmarking Oj Parsers against each other #840

Closed copiousfreetime closed 1 year ago

copiousfreetime commented 1 year ago

I was testing benchmarking of Oj on my particular type of json documents to see which one works best for my use case. I'm currently using the Oj::Doc parser and wanted to test it against the new Oj::Parser oriented items and came to the surprising find that Oj::Parser.new(:usual) is almost the same speed as Oj::Doc - although Oj::Doc has has a high variability in its runs, and I don't see a compelling case to use saj or scp parsers for my use case.

I wanted to make sure of a couple of things am i:

benchmark script

#!/usr/bin/env ruby

require 'oj'
require 'benchmark/ips'

event = <<JSON
{"t":"2023-01-23 23:00:09.000","ms": "012345", "act":"BE30","cs":"AKN183","alic":"AKN","alna":"Alkan Air Ltd","aporgla":49.055,"aporglo":-123.87,"aporgic":"CYCD","aporgia":"YCD","aporgna":"Nanaimo Airport","aporgci":"Nanaimo","aporgst":"British Columbia","aporgco":"Canada","aporgtz":-8.0,"aporgtzns":"PST","aporgtznl":"Pacific Standard Time","apdstla":54.4685,"apdstlo":-128.576,"apdstic":"CYXT","apdstia":"YXT","apdstna":"Terrace Airport","apdstci":"Terrace","apdstst":"British Columbia","apdstco":"Canada","apdsttz":-8.0,"apdsttzns":"PST","apdsttznl":"Pacific Standard Time","depsu":"2023-01-23 21:45:00","depau":"2023-01-23 22:04:00","depeu":"2023-01-23 22:00:00","deppu":"2023-01-23 21:45:00","arreu":"2023-01-24 00:02:00","la":51.55,"lo":-126.083333,"alt":28000,"hd":337.4855040049144,"gs":210,"tkot":"2023-01-23 22:04:42","aptkoci":"Nanaimo","aptkost":"British Columbia","aptkoco":"Canada","aptkola":49.055,"aptkolo":-123.87,"aptkoic":"CYCD","aptkoia":"YCD","aptkona":"Nanaimo Airport","aptkotz":-8.0,"aptkotzns":"PST","aptkotznl":"Pacific Standard Time","so":"ASDI","wp":"49.05,-123.87;49.13,-123.87;49.15,-123.92;49.2,-124.0;49.25,-124.09;49.35,-124.26;49.46,-124.45;49.67,-124.81;49.71,-124.89;49.77,-124.93;49.89,-125.01;50.1,-125.13;50.23,-125.23;51.06,-125.74;51.38,-125.97;52.03,-126.38;52.18,-126.48;52.77,-126.9;52.83,-126.94;53.57,-127.47;53.79,-127.63;54.05,-127.82;54.46,-128.14;54.62,-128.26;54.81,-128.41;54.83,-128.42;54.68,-128.49;54.63,-128.51;54.6,-128.52;54.52,-128.56;54.47,-128.58;","route":"CYCD..YCD..UQQ..RESIN..CYXT","onground":false,"fullroute":"CYCD..YCD..UQQ..RESIN..CYXT"}
JSON

class GCSuite
  def warming(*)
    run_gc
  end

  def running(*)
    run_gc
  end

  def warmup_stats(*)
  end

  def add_report(*)
  end

  private

  def run_gc
    GC.enable
    GC.start
    GC.disable
  end
end

class LimitedSaj < ::Oj::Saj
  attr_reader :t
  attr_reader :ms

  def initialize
    @t = nil
    @ms = nil
  end

  def add_value(value, key)
    case key
    when "t"
      @t = value
    when "ms"
      @ms = value
    end
    self
  end
end
oj_parser_saj = Oj::Parser.new(:saj)
oj_parser_saj_reuse_handler = Oj::Parser.new(:saj)

oj_saj_parse_limited_handler = LimitedSaj.new

limited_saj_handler = LimitedSaj.new
oj_parser_saj_reuse_handler.handler = limited_saj_handler

class LimitedScp
  def hash_start()
    {}
  end

  def hash_set(h,key,value)
    case key
    when "t"
      h[key] = value
    when "ms"
      h[key] = value
    end
  end

  def add_value(v)
    v
  end
end
limited_scp = LimitedScp.new

oj_parser_usual = Oj::Parser.new(:usual)

Benchmark.ips do |x|
  # disable gc during a run
  x.config(:time => 5, :warmup => 2, :suite => GCSuite.new)

  ## Oj.load

  x.report("load") do
    e = Oj.load(event)
    timestamp = e['t']
    hexid = e['ms'] || ""
  end

  ## Oj::Doc

  x.report('Oj::Doc') do
    Oj::Doc.open(event) do |doc|
      timestamp = doc.fetch("/t")
      hexid = doc.fetch("/ms") || ""
    end
  end

  ## Oj.saj

  x.report("Oj.saj alloc") do
    handler = LimitedSaj.new
    e = Oj.saj_parse(handler, event)
    timestamp = handler.t
    hexid = handler.ms || ""
  end

  x.report("Oj.saj reuse") do
    e = Oj.saj_parse(oj_saj_parse_limited_handler, event)
    timestamp = oj_saj_parse_limited_handler.t
    hexid = oj_saj_parse_limited_handler.ms || ""
  end

  ## Oj.sc

  x.report("Oj.sc alloc") do
    p = LimitedScp.new
    e = Oj.sc_parse(p, event)
    timestamp = e['t']
    hexid = e['ms'] || ""
  end

  x.report("Oj.sc reuse") do
    e = Oj.sc_parse(limited_scp, event)
    timestamp = e['t']
    hexid = e['ms'] || ""
  end

  ## Oj::Parser :usual

  x.report("Parser.usual alloc") do
    p = Oj::Parser.new(:usual)
    e = p.parse(event)
    timestamp = e['t']
    hexid = e['ms'] || ""
  end

  x.report("Parser.usual reuse") do
    e = oj_parser_usual.parse(event)
    timestamp = e['t']
    hexid = e['ms'] || ""
  end

  ## Oj::Parser :saj

  x.report("Parser.saj alloc") do
    p = LimitedSaj.new
    oj_parser_saj.handler = p
    oj_parser_saj.parse(event)
    timestamp = p.t
    hexid = p.ms || ""
  end

  x.report("Parser.saj reuse") do
    oj_parser_saj_reuse_handler.parse(event)
    timestamp = oj_parser_saj_reuse_handler.handler.t
    hexid = oj_parser_saj_reuse_handler.handler.ms || ""
  end

  x.compare!
end

results

% bundle exec ruby bench-oj.rb
Warming up --------------------------------------
                load     5.343k i/100ms
             Oj::Doc    12.719k i/100ms
        Oj.saj alloc     5.757k i/100ms
        Oj.saj reuse     6.194k i/100ms
         Oj.sc alloc     6.011k i/100ms
         Oj.sc reuse     6.164k i/100ms
  Parser.usual alloc     3.126k i/100ms
  Parser.usual reuse    12.311k i/100ms
    Parser.saj alloc     8.072k i/100ms
    Parser.saj reuse     8.242k i/100ms
Calculating -------------------------------------
                load     59.642k (± 8.3%) i/s -    299.208k in   5.050765s
             Oj::Doc    149.220k (±12.5%) i/s -    737.702k in   5.015231s
        Oj.saj alloc     55.559k (± 6.5%) i/s -    282.093k in   5.102834s
        Oj.saj reuse     59.600k (± 5.1%) i/s -    303.506k in   5.107395s
         Oj.sc alloc     60.764k (± 4.8%) i/s -    306.561k in   5.059862s
         Oj.sc reuse     61.245k (± 2.0%) i/s -    308.200k in   5.034380s
  Parser.usual alloc     30.396k (±13.6%) i/s -    150.048k in   5.012783s
  Parser.usual reuse    112.162k (± 7.8%) i/s -    566.306k in   5.079224s
    Parser.saj alloc     77.487k (± 8.4%) i/s -    387.456k in   5.047394s
    Parser.saj reuse     79.032k (± 6.9%) i/s -    395.616k in   5.036396s

Comparison:
             Oj::Doc:   149219.9 i/s
  Parser.usual reuse:   112162.4 i/s - 1.33x  (± 0.00) slower
    Parser.saj reuse:    79031.8 i/s - 1.89x  (± 0.00) slower
    Parser.saj alloc:    77486.5 i/s - 1.93x  (± 0.00) slower
         Oj.sc reuse:    61245.3 i/s - 2.44x  (± 0.00) slower
         Oj.sc alloc:    60764.0 i/s - 2.46x  (± 0.00) slower
                load:    59641.5 i/s - 2.50x  (± 0.00) slower
        Oj.saj reuse:    59599.6 i/s - 2.50x  (± 0.00) slower
        Oj.saj alloc:    55558.8 i/s - 2.69x  (± 0.00) slower
  Parser.usual alloc:    30396.5 i/s - 4.91x  (± 0.00) slower
ohler55 commented 1 year ago

Those results are in line with what I would expect. I am pretty happy with the way the new Parser series worked out. The Oj::Doc defers creating Ruby objects until needed so depending on what it is used for it may much faster or not that impressive.

ohler55 commented 1 year ago

You might try some of the options on the parsers such as cache_keys and cache_strings.