halogenandtoast / alchemist

A scientific conversion library.
155 stars 28 forks source link

Unexpected performance degradation with alchemist #13

Closed GUI closed 11 years ago

GUI commented 11 years ago

The way alchemist utilizes method_missing seems to result in a fairly significant performance hit when numbers are used in certain ways. The problems can crop up even when you're not intending to use alchemist, so this can be quite unexpected. For example, time parsing can become 10x slower just by having alchemist installed.

Without alchemist, parsing 100,000 time strings takes 3.7 seconds

irb(main):001:0> require "time"; require "benchmark"
=> true
irb(main):002:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
  3.740000   0.010000   3.750000 (  3.738171)

With alchemist, parsing 100,000 time strings takes 43 seconds

irb(main):001:0> require "time"; require "benchmark"; require "alchemist"
=> true
irb(main):002:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
 42.890000   0.430000  43.320000 ( 43.322304)

This seems to largely stem from something in Time.parse calling to_str on a bunch of numeric objects. This method doesn't exist, but with alchemist's method_missing implementation, each to_str call incurs more of an overhead.

I didn't do further debugging to see if there are similar performance consequences outside of this time parsing example, but it seems like a possibility. After removing alchemist from our app, we saw pretty significant speed increases throughout the computationally heavy parts of our app (Rails requests that were taking 30 seconds to process now take 5 seconds).

I'm unsure of any simple fixes given the current dependency on method_missing. But I saw you recently started a rewrite branch, so maybe this is something to consider as you rewrite? Sorry I don't have any fixes, but I wanted to file this as an FYI, since anyone dealing with lots of data might be having similar performance problems in their apps without even realizing it.

If anyone's interested, we ended up switching to m9t for our conversions. It isn't quite as syntactically elegant as alchemist, but it doesn't interfere and slow down unrelated aspects of our application, and it also seems to be the fastest conversion library I could find.

willcosgrove commented 11 years ago

This seems like a great place to refactor with Ruby 2.0's refinements!

tsagadar commented 11 years ago

We face exactly the same issue: exporting 50'000 results to a CSV takes about 10 minutes - most of which is spent in Alchemist. Hopefully this issue can be addressed soon, switching to a different library is not our favorite option.

reger commented 11 years ago

Let's not make it a Ruby 2.0-only fix though, please fix for 1.9 as well.

halogenandtoast commented 11 years ago

Definitely going to look into this today and try to make sure it's a backported fix for all ruby versions.

halogenandtoast commented 11 years ago

One possible solution to this would be to not use method_missing and extend Numerics interface to include all of the alchemist methods. This would make the method table for Numeric quite large. I think the direction I was planning on moving to was to have NumericExt only load when you explicitly requested it.

halogenandtoast commented 11 years ago

After walking away for a bit, here's what I've decided. Alchemist will provide a method unit for getting an Alchemist unit

Alchemist.unit('meter')

This will be defined in a module Alchemist::Unit and can be mixed into Numeric separate from any other part of alchemist providing you with

1.unit('meter')

This module will not be mixed in by default.

I will still provide a numeric_ext but it will not be mixed in by default and documentation will explicitly mention that mixing this module in can have impacts on performance. I will most likely deprecate this in favor of what follows.

I will also provide a way to dynamically generate modules for mixing in measurement categories like distance into the interface of an object. It might look something like this:

class Numeric
  include Alchemist.category('distance')
end

This would only mix in the distance methods and not the entire giant table. If you choose to, you can include everything, but I think most users of this library are only using it for very specific measurement categories. Doing this will also allow you to define your own categories and mix them in "safely".

Let me know if you have any thoughts.

halogenandtoast commented 11 years ago

Here is what I have currently in terms of what is on master for speed:

irb(main):002:0> require "time"; require "benchmark"
=> true
irb(main):003:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
  3.600000   0.010000   3.610000 (  3.606855)
=> nil
irb(main):004:0> require "alchemist"
=> true
irb(main):005:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
  3.600000   0.010000   3.610000 (  3.602660)
=> nil
irb(main):006:0> Alchemist.setup
=> Numeric
irb(main):007:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
  9.630000   0.060000   9.690000 (  9.690889)
=> nil

So if you manually include Alchemist into Numeric it will still slow down, but not nearly as much as before.

halogenandtoast commented 11 years ago

Okay, I've added some methods to a list of methods to ignore and brought the speed up a little

irb(main):002:0> require "time"; require "benchmark"
=> true
irb(main):003:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
  3.530000   0.010000   3.540000 (  3.541371)
=> nil
irb(main):004:0> require "alchemist"; Alchemist.setup
=> Numeric
irb(main):005:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
  4.260000   0.010000   4.270000 (  4.274634)
=> nil
halogenandtoast commented 11 years ago

Ruby 1.9 doesn't benefit quite as much

irb(main):002:0> require "time"; require "benchmark"
=> true
irb(main):003:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
  3.510000   0.010000   3.520000 (  3.522377)
=> nil
irb(main):004:0> require "alchemist"; Alchemist.setup
=> Numeric
irb(main):005:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
  6.210000   0.150000   6.360000 (  6.358271)
=> nil
halogenandtoast commented 11 years ago

I think now that the module won't be mixed in by default, the speed decrease won't be as unexpected. If you want speed, don't mix in the module, just use the Alchemist.measure method instead.

halogenandtoast commented 11 years ago

As a final note, master now will dynamically generate modules for inclusion into Numeric rather than using method_missing which keeps everything speedy both on Ruby 2.0 and Ruby 1.9

irb(main):003:0> require "time"; require "benchmark"
=> true
irb(main):004:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
  3.530000   0.010000   3.540000 (  3.538493)
=> nil
irb(main):005:0> require "alchemist"
=> true
irb(main):006:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
  3.550000   0.010000   3.560000 (  3.561823)
=> nil
irb(main):007:0> Alchemist.setup
=> [:absorbed_radiation_dose, :angles, :area, :capacitance, :distance, :dose_equivalent, :electric_charge, :electric_conductance, :electrical_impedance, :electromotive_force, :energy, :frequency, :force, :illuminance, :inductance, :information_storage, :luminous_flux, :luminous_intensity, :magnetic_flux, :magnetic_inductance, :mass, :power, :pressure, :radioactivity, :time, :volume, :density, :temperature]
irb(main):008:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
  3.520000   0.010000   3.530000 (  3.529478)
=> nil