precog / platform

Advanced Analytics Engine for NoSQL Data
http://www.slamdata.com
GNU Affero General Public License v3.0
401 stars 64 forks source link

Release open files during inactive periods #495

Closed tixxit closed 11 years ago

tixxit commented 11 years ago

This adds the capability for NIHDB to go into an inactive (quiescent) state. Basically, the DB remains open, but it releases most of its resources (open files, raw log, etc.). This state is entered explicitly using the NIHDB#quiesce method. Any further methods will automatically force the resources back into existence.

Additionally, the VFS PathManagingActor now has an inactivity timeout that will call NIHDB#quiesce after a configurable (storage.quiescence_timeout, default 300) number of seconds on all NIHDB versions. This is implemented as just a receive timeout, a built-in part of Akka that will send a ReceiveTimeout message after a given period of inactivity.

I'm not sure how to unit test this. However, I did fire up the shard, and put a resource through a cycle of quiescence and work and everything worked fine -- after 5 minutes (default timeout) the NIHDB would quiesce and subsequent queries/ingests would work just fine (then it would quiesce again, repeat, etc.)

tixxit commented 11 years ago

Review by @nuttycom

dcsobral commented 11 years ago

This looks useful, though, as far as I can see, it would be completely ineffective in preventing the "too many open files" problem we had recently -- those came at prolonged times of continuous activity.

nuttycom commented 11 years ago

So, I think that it's going to be necessary to add a bounded LRU cache so that we can be sure of having a fixed upper bound on the number of filehandles open. Otherwise, especially during reingest scenarios (which is where we encountered the problem) we're probably going to see the same issue again.