Caching strategy - Githubissues

ioquatix commented 9 years ago

I like having caching and while it is kind of out of scope for this, I think it adds value (since this does take time to process) and would be nice to be reusable, since I may not be the only user who wants it to cache. This way developers, like ourselves, would not be required to implement their own caching layer.

ioquatix commented 9 years ago

@Glavin001 Please feel free to propose a PR.

ioquatix commented 9 years ago

This is now implemented, there is a 1 second cache period by default which I think is reasonable. This can be customised by creating your own instance of ShellEnvironment and setting the cachePeriod option to something higher or false. This is supported in 0.2.0 release.

Glavin001 commented 9 years ago

This is now implemented,

@ioquatix be sure to use Smart Commit Messages and reference these applicable issues. I'd like to be able to see that you have implemented something from the commit message and the commit showing up automatically under the issue, instead of requiring that you manually make a comment saying that you have implemented it. We should maximize our usage of GitHub's features.

there is a 1 second cache period by default which I think is reasonable

As long as this time period is customizable, I think 1 second is reasonable. For my Atom Beautify unit tests I set it to 10 seconds or so, because I want it to remain for the duration of all of the tests, as it does not change. It doubles the time the tests take when it fetches the environment on each test.

This can be customised by creating your own instance of ShellEnvironment and setting the cachePeriod option to something higher or false.

Could customization be an option, such that when you call the function to fetch environment variables that you can provide optional options and one of the supported options is the cachePeriod. Thus we could have a configuration option in a Package's settings to change the cachePeriod and we would not need to create a new instance for every cachePeriod change and instead could always forward the value from the Package's settings.

Great work so far! I'm really excited about ShellEnvironment!

ioquatix commented 9 years ago

Adding a function to access the global singleton would be possible. However, a single global object is probably something that should be fairly standard. Otherwise, in other scripts, you might get unexpected behaviour if one script changes the defaults. On the other hand, if the user is doing it via some kind of preference, that's their choice, so supporting that might be useful. It would be pretty easy to do so by having a class method which returns the shared instance.

Adding options isn't that easy because it would have to be specified after the callback argument in order to be optional. In the end I decided that adding that to the actual method call wasn't that useful.

Glavin001 commented 9 years ago

Here's what I envision:

Expected API Usage

ShellEnvironment = require 'shell-environment'
cachePeriod = 1000 # 1 second, in milliseconds
# Option first argument could be "options" object
ShellEnvironment.loginEnvironment({cachePeriod}, (error, environment) =>
    if environment
        child = ChildProcess.spawn(args[0], args.slice(1), env: environment)
    else
        console.log(error)
)

Implementation Details

Store a cached value of the previous environment fetched
Store date for when that was fetched
The options argument is optional
If cachePeriod option is set, then use it
- Check if the difference between the current time and the cached time is greater than cachePeriod
- if so, then re-fetch and set the cache value and cache time to current fetch

ioquatix commented 9 years ago

The options argument is optional

If you can see an obvious and easy way to do this let me know.

If cachePeriod option is set, then use it

Sure, but does this affect everyone who uses the API?

Check if the difference between the current time and the cached time is greater than cachePeriod if so, then re-fetch and set the cache value and cache time to current fetch

Yes this is already done.

Glavin001 commented 9 years ago

If you can see an obvious and easy way to do this let me know.

Something like

fn = (options, callback) ->
  if typeof options is "function"
    callback = options
    options = {}

And in JavaScript (thanks to http://coffeescript.org/ ):

var fn = function(options, callback) {
  if (typeof options === "function") {
    callback = options;
    return options = {};
  }
};

ioquatix commented 9 years ago

If you think that's a good idea, then feel free to submit a PR and we can merge it. However, I personally think that the global API should not change it's behaviour*. It seems like it would be easier for you to simply write:

shellEnvironment = new ShellEnvironment(cachePeriod: 1000)

...

shellEnvironment.getEnvironment (error, environment) =>
    if environment
        child = ChildProcess.spawn(args[0], args.slice(1), env: environment)
    else
        console.log(error)

By changing behaviour, if you call the global API from multiple places with different cache periods, what should happen?

madleech commented 9 years ago

What about making the cache period a class variable?

ShellEnvironment = require 'shell-environment'
ShellEnvironment.cachePeriod = 1000
ShellEnvironment.loginEnvironment (error, environment) =>
    if environment
        child = ChildProcess.spawn args[0], args.slice(1), env: environment
    else
        console.log error

That way old API can remain, but it's easy to adjust the cache period. As Samuel says though there is the caveat that calling it from multiple places with different cache periods would probably lead to undefined behaviour.

Glavin001 commented 9 years ago

shellEnvironment = new ShellEnvironment(cachePeriod: 1000)

I have no problems with that.

My suggestion is because I am considering Atom package developers: if they wish to make the cachePeriod a package config option, then they would need to listen for that option to change and then create a new instance with new ShellEnvironment(cachePeriod: 1000).

With a more functional approach (pass all options as arguments when calling fetch), it supports both usages and is easy for both cases. I see no disadvantage with making it more functional and providing an optional options argument.

Furthermore, and likely more important, I think this could make ShellEnvironment more extensible in the future, if we have other options to consider (maybe pass in other options to use when calling spawn?).

However, I personally think that the global API should not change it's behaviour.

It wouldn't be a breaking change. It would support both:

shellEnvironment.getEnvironment (error, environment) =>

and

shellEnvironment.getEnvironment {cachePeriod:cachePeriod} (error, environment) =>

usage. I definitely see the first being the most common and should be priority, however I just want to make this functional and easier for Atom package devs and similar use cases described above.

If you think that's a good idea, then feel free to submit a PR and we can merge it.

I want to have the discussion about pros and cons and agree on the best solution. I am not interested in simply making a PR and even merging it, if you do not agree and are not happy with it. We should all be happy with it's implementation and usage.

Glavin001 commented 9 years ago

That way old API can remain, but it's easy to adjust the cache period.

The old API will not change. I am not recommending the API change / break and would not advocate to do so if that were the case. Instead we add an option argument. See https://github.com/ioquatix/shell-environment/issues/3#issuecomment-101064369

As Samuel says though there is the caveat that calling it from multiple places with different cache periods would probably lead to undefined behaviour.

I do not think that it would lead to undefined behaviour. I see the cachePeriod as being the maximum allowed time that a previously fetched environment would be acceptable. So if I call getEnvironment without the cachePeriod it can fallback to a default cachePeriod. However, if cachePeriod is set the it would actually make the call to spawn and then cache that for later and store a new cache time and value. Regardless, calling getEnvironment will always return an environment that was true up until the time frame indicated by cachePeriod. Whether the value would be fresh (fetched, not cached: value in cache was outside of cachePeriod) or from cache.

madleech commented 9 years ago

Having an optional options argument seems fine to me, it's how you'd do it in most other languages.

ioquatix commented 9 years ago

The maximum allowed time that a previously fetched environment would be acceptable. So if I call getEnvironment without the cachePeriod it can fallback to a default cachePeriod. However, if cachePeriod is set the it would actually make the call to spawn and then cache that for later and store a new cache time and value. Regardless, calling getEnvironment will always return an environment that was true up until the time frame indicated by cachePeriod. Whether the value would be fresh (fetched, not cached: value in cache was outside of cachePeriod) or from cache.

This is a good definition for how the cachePeriod could works. However, whether or not it makes sense to change this on the fly, I'm not sure about that.

Right now, cachePeriod is a class member, so the only way to implement what you want is by modifying this value, and since there is a single static instance of ShellEnvironment for the global API, you'd need to modify global state every time you called it.

Because of this, if you set it and then someone else set it to something else, your original behaviour would not be retained. This might be okay for something simple like cachePeriod but isn't great for things like command and potentially other options in the future.

I'd prefer to have sane defaults that cover typical use rather than providing lots of configuration options. I don't think providing and changing options on the fly is a good idea. But, that said, feel free to submit a PR.

If you want to have an option to getEnvironment which controls cachePeriod, that code path would become complex without, IMHO, much real benefit. I'd prefer to keep things simple.

The best I think we can do in this case is to expose the shared global state, and I've done this in https://github.com/ioquatix/shell-environment/commit/0246bbb2c221eec08de66c9e73131b960817dedf

However, keep in mind, that by changing the cachePeriod, you'd do it for all clients who use this global API. If one API modifies cachePeriod, and then another (e.g. script-runner and then atom-beautify), they'd be conflicting about their expectations of how the global API should work. The only situation where it makes sense to change this is by the user. If you want to customise the behaviour of the cachePeriod, e.g. for testing, just use an instance of the class directly. It's way more predictable and isolated, and you can set the cache period to 1000. for actual use, the cachePeriod should be very small (<= 10s), the performance impact minimal, so that it reflects the users' shell environment accurately.

I wil close this issue as this has been implemented, it does work and it might not be ideal. The next step, if we want to change it, is to propose a PR and we can continue discussion there.

ioquatix commented 9 years ago

Furthermore, and likely more important, I think this could make ShellEnvironment more extensible in the future, if we have other options to consider (maybe pass in other options to use when calling spawn?).

Yes, makes total sense. On this point, I'd rather see

class MyAtomPlugin:
  construct: ->
    @shellEnvironment = new ShellEnvironment(options)

  doSomething: ->
    @shellEnvironment.getEnvironment ...

If the user wanted to have global defaults, maybe we could implement these, like:

ShellEnvironment.defaultCachePeriod = 1000

and these are used globally as the default if no option is specified.

ioquatix / shell-environment

Caching strategy #3

Expected API Usage

Implementation Details