impactlab / eemeter

‼️ MOVED TO https://github.com/openeemeter/eemeter - Core computation engine for the Open Energy Efficiency Meter
https://eemeter.readthedocs.io/
MIT License
25 stars 13 forks source link

Reading from cache is fast, but writing to cache is slow. #51

Closed philngo closed 9 years ago

philngo commented 9 years ago

This may make for annoyingly (even prohibitively) slow first experiences with cached weather sources.

potash commented 9 years ago

Yeah, inserting via the sqlalchemy ORM is slow (for good reasons). I didn't realize we had enough data that this would be noticeable though. Two alternatives to commit():

1) Use the insert() method: http://docs.sqlalchemy.org/en/rel_0_8/faq.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow

This still requires reading every line of the csv in python and instantiating all the objects, which isn't ideal.

2) avoid reading csv into python objects and just load the source csv into PostgreSQL via the \copy command. this would require writing new code to transform the csv into the appropriate schema and do some checks on what's in the db already.

Neither solution is especially elegant to me. 2) is more performant while 1) allows reuse of more of your existing sqlalchemy code. Guessing you prefer 1)

potash commented 9 years ago

Third option is just to cache the files and parse them into memory at runtime and forget about the database cache. This might be less ideal though if there is so much weather data that it doesn't fit nicely into memory.

philngo commented 9 years ago

There's some logic right now to handle overwriting (or not overwriting) previously cached data, which may slow things down quite a bit. At the very least we should rewrite that logic.

bryangoodrich commented 9 years ago

Consistent interface is the most important. If you use bulk copy processing, then it's going to need to be customized for each interface (I'm using SQL Server, not Postgre). Writing for TMY3WeatherSource took a little bit of time, but not a big deal (2012 to now).

philngo commented 9 years ago

Let us know if you run into problems with SQL Server - we've only tested on postgres.

Another thought regarding TMY3WeatherSource caching: it currently caches hourly normals, but for now we really only need daily. Maybe we're sitting on a relatively easy 24x speed increase.

philngo commented 9 years ago

This was addressed by 85d654d81febfaa60b01fc814e4f862ac59d8c3d (and a couple other subsequent bug squashing commits); closing. Ended up removing ORM, but sticking with SQLAlchemy. The tables used in the current version of the caching mechanism are not consistent with the old version of the caching mechanism, so caches will need to be reset (and tables rebuilt) following this update.