Closed GUI closed 11 years ago
This seems like a great place to refactor with Ruby 2.0's refinements!
We face exactly the same issue: exporting 50'000 results to a CSV takes about 10 minutes - most of which is spent in Alchemist. Hopefully this issue can be addressed soon, switching to a different library is not our favorite option.
Let's not make it a Ruby 2.0-only fix though, please fix for 1.9 as well.
Definitely going to look into this today and try to make sure it's a backported fix for all ruby versions.
One possible solution to this would be to not use method_missing and extend Numerics interface to include all of the alchemist methods. This would make the method table for Numeric quite large. I think the direction I was planning on moving to was to have NumericExt only load when you explicitly requested it.
After walking away for a bit, here's what I've decided. Alchemist will provide a method unit for getting an Alchemist unit
Alchemist.unit('meter')
This will be defined in a module Alchemist::Unit and can be mixed into Numeric separate from any other part of alchemist providing you with
1.unit('meter')
This module will not be mixed in by default.
I will still provide a numeric_ext but it will not be mixed in by default and documentation will explicitly mention that mixing this module in can have impacts on performance. I will most likely deprecate this in favor of what follows.
I will also provide a way to dynamically generate modules for mixing in measurement categories like distance into the interface of an object. It might look something like this:
class Numeric
include Alchemist.category('distance')
end
This would only mix in the distance methods and not the entire giant table. If you choose to, you can include everything, but I think most users of this library are only using it for very specific measurement categories. Doing this will also allow you to define your own categories and mix them in "safely".
Let me know if you have any thoughts.
Here is what I have currently in terms of what is on master for speed:
irb(main):002:0> require "time"; require "benchmark"
=> true
irb(main):003:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
3.600000 0.010000 3.610000 ( 3.606855)
=> nil
irb(main):004:0> require "alchemist"
=> true
irb(main):005:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
3.600000 0.010000 3.610000 ( 3.602660)
=> nil
irb(main):006:0> Alchemist.setup
=> Numeric
irb(main):007:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
9.630000 0.060000 9.690000 ( 9.690889)
=> nil
So if you manually include Alchemist into Numeric it will still slow down, but not nearly as much as before.
Okay, I've added some methods to a list of methods to ignore and brought the speed up a little
irb(main):002:0> require "time"; require "benchmark"
=> true
irb(main):003:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
3.530000 0.010000 3.540000 ( 3.541371)
=> nil
irb(main):004:0> require "alchemist"; Alchemist.setup
=> Numeric
irb(main):005:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
4.260000 0.010000 4.270000 ( 4.274634)
=> nil
Ruby 1.9 doesn't benefit quite as much
irb(main):002:0> require "time"; require "benchmark"
=> true
irb(main):003:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
3.510000 0.010000 3.520000 ( 3.522377)
=> nil
irb(main):004:0> require "alchemist"; Alchemist.setup
=> Numeric
irb(main):005:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
6.210000 0.150000 6.360000 ( 6.358271)
=> nil
I think now that the module won't be mixed in by default, the speed decrease won't be as unexpected. If you want speed, don't mix in the module, just use the Alchemist.measure method instead.
As a final note, master now will dynamically generate modules for inclusion into Numeric rather than using method_missing which keeps everything speedy both on Ruby 2.0 and Ruby 1.9
irb(main):003:0> require "time"; require "benchmark"
=> true
irb(main):004:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
3.530000 0.010000 3.540000 ( 3.538493)
=> nil
irb(main):005:0> require "alchemist"
=> true
irb(main):006:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
3.550000 0.010000 3.560000 ( 3.561823)
=> nil
irb(main):007:0> Alchemist.setup
=> [:absorbed_radiation_dose, :angles, :area, :capacitance, :distance, :dose_equivalent, :electric_charge, :electric_conductance, :electrical_impedance, :electromotive_force, :energy, :frequency, :force, :illuminance, :inductance, :information_storage, :luminous_flux, :luminous_intensity, :magnetic_flux, :magnetic_inductance, :mass, :power, :pressure, :radioactivity, :time, :volume, :density, :temperature]
irb(main):008:0> puts Benchmark.measure { 100000.times { Time.parse("2012-01-01 01:01:01") } }
3.520000 0.010000 3.530000 ( 3.529478)
=> nil
The way alchemist utilizes
method_missing
seems to result in a fairly significant performance hit when numbers are used in certain ways. The problems can crop up even when you're not intending to use alchemist, so this can be quite unexpected. For example, time parsing can become 10x slower just by having alchemist installed.Without alchemist, parsing 100,000 time strings takes 3.7 seconds
With alchemist, parsing 100,000 time strings takes 43 seconds
This seems to largely stem from something in Time.parse calling
to_str
on a bunch of numeric objects. This method doesn't exist, but with alchemist'smethod_missing
implementation, eachto_str
call incurs more of an overhead.I didn't do further debugging to see if there are similar performance consequences outside of this time parsing example, but it seems like a possibility. After removing alchemist from our app, we saw pretty significant speed increases throughout the computationally heavy parts of our app (Rails requests that were taking 30 seconds to process now take 5 seconds).
I'm unsure of any simple fixes given the current dependency on
method_missing
. But I saw you recently started a rewrite branch, so maybe this is something to consider as you rewrite? Sorry I don't have any fixes, but I wanted to file this as an FYI, since anyone dealing with lots of data might be having similar performance problems in their apps without even realizing it.If anyone's interested, we ended up switching to m9t for our conversions. It isn't quite as syntactically elegant as alchemist, but it doesn't interfere and slow down unrelated aspects of our application, and it also seems to be the fastest conversion library I could find.