radical-cybertools / radical.analytics

Analytics for RADICAL-Cybertools
Other
1 stars 1 forks source link

sessions are now cached #97

Closed andre-merzky closed 5 years ago

andre-merzky commented 5 years ago

in $HOME/.radical/analytics/cache, reducing session loading time by about 80%. The cache is loaded when a unit gets created via a ra.Session.create(...) call (taking the same parameters as __init__(...). Any subsequent call to create() (in the same or different application instances) will then be loaded from the cache.

This PR currently targets feature/session_plots as it was branched off that (to simplify review). Once the parent branch is merged this will be re-routed to devel.

mturilli commented 5 years ago

Should we have the notion of lifetime for the cache?

andre-merzky commented 5 years ago

Should we have the notion of lifetime for the cache?

I don't think so, really - the session is a recording of an application run, and as such should never change once the application completed, so the cache should never become invalid. One could consider this a pre-processing step on-the-fly, which mostly avoids re-reading and re-parsing the (static) profiles.

Having said that: create() has a cache parameter which defaults to True - when set to False, one can enforce a re-parsing of the profiles.

mturilli commented 5 years ago

It was a more trivial worry: after I processed sessions worth 1m tasks, how full is my HD?! Pesky dot directories, silently created tend to be stealthy space hoggers :/

andre-merzky commented 5 years ago

Ha, true. exp4/rp.session.login4.mturilli1.018123.0000.tar.bz2 is 21 MB - but unpacked it's 4.2 GB. The cached session is at about 320MB.

So, yeah, not tiny. The problem with a notion of a lifetime in that sense: who is enforcing that lifetime? If you set a lifetime of a week - RA might not be running in a week to delete the stale cache...

mturilli commented 5 years ago

Yup, a bit too large I would say. I would probably issue a warning on exiting reporting size/location of the cache. Then I guess an option to clean old sessions on start, possibly those that are not in the list of the sessions passed to the current call?

andre-merzky commented 5 years ago

I added bz2 compression to the cache files, which reduces the problem significantly. I would propose to add a separate tool to manage cache cleaning.