xonsh / lazyasd

Lazy & self-destructive tools for speeding up module imports
http://xon.sh
BSD 3-Clause "New" or "Revised" License
52 stars 8 forks source link

Lazy proxy dict #1

Open jankatins opened 8 years ago

jankatins commented 8 years ago

[Continuing the conversation from https://twitter.com/scopatz/status/756544618964004868]

I've built a LazyProxyDict which fetches values on first access. It's only lazy but not self-destructive. Does that still fit?

Isn't this basically LazyDict?

Tried that first, but a) the parent place isn't fixed (assigned to a trait) and b) values is an os.environ, so unknown keys...

For (a) I think if we had the context passed in as None, we could disable it, (b) I don't understand, where is the usage site?

Usage is in the environment kernel manager: It activates a conda env to get the environment variables (mostly for the altered PATH element, but I guess there could be others as well) and passes that on to the notebook server to use as a new environment variables for the started kernel. Before using the lazy proxy, starting up a notebook server took a lot of seconds (and the notebook page showed nothing) due to activating all envs which I had installed. Now it's as fast as usual and the activate only happens when the kernel is used for the first time.

scopatz commented 8 years ago

Ok, so I think if we made LazyObject, not LazyDict, optionally not self-destructive this would meet your needs. I think it is a four line change

jankatins commented 8 years ago

Does that also fakes the own identity? Like isinstance(LazyObject(func_returns_dict), dict)? In this case the object is put into a traitlet which wants a dict (not even a Mapping is enough :-()

scopatz commented 8 years ago

Yes, I believe that it does. The following script:

t.py

from lazyasd import LazyObject

x = LazyObject(dict, globals(), 'x')

print(isinstance(x, dict))
print(type(x))
print(x)

yields:

scopatz@localhost ~/temp $ python t.py 
True
<class 'dict'>
{}
jankatins commented 8 years ago

Ok, the main problem here is that in the above example the object is resolved on the isinstance() (which will happen on the construction of the surrounding object (the KernelSpec)). That means that the speed gains are gone again.

In [6]: def dicter():
   ...:     print("Resolving")
   ...:     return {}
   ...:

In [7]: x = LazyObject(dicter, globals(), 'x')

In [8]: print(isinstance(x, dict))
Resolving
True

Basically I need an object which looks like a dict and only does the "work" of becoming one when a real value is used.

d = LaszyProxyDict(loader)
isinstance(d, dict) # true, but loader isn't called yet
try:
   d == {...} # doesn't compare, but raises / used internally by the traitlet on assignment to figure out if the value is changed to the earlier (=empty) one
except:
   print("not yet loaded...")
d["something"] # triggers loading
d == {...} # now returns True or False

I could think of a ProxObject(loader, (InterfaceClass,)) which would make insinstance(proxyobj, InterfaceClass) return True. No idea if that's easily done, though... Probably with generating the type on the fly?

The equal behaviour is basically a specific problem of traitlets (the normal == would trigger the load, so again would mean no timesaving :-(). On the other hand, I could probably work around it with a special Kernelspec subclass...

scopatz commented 8 years ago

Ahh yeah I see. Yeah, we probably want a ProxyObject class then. I think in the constructor you need to save the target type in _lazydo then in __getattribute__, whenever __class__ is looked up it returns the value from the _lazydo without loading.

This would be a great addition to have. Alternatively we could also do this in the LazyObject with a kwarg called cls or type. The behaviour would be as described here if it is not None. I kind of like this the most since it leads to the least duplicated code.

Or we could generalize this and make it so that LazyObject takes **kwargs that are stored for direct access. That way you could do LazyObject(loader, __class__=dict)

jankatins commented 8 years ago

The latter sounds great. I think I could even get around subclassing if this would also take functions... So basically this would mean that the basic algo in __getattribute__ gets changed to

if !self._lasdo["loaded"]:
   if name in self._lasdo["facade"]: # facade takes all kwargs
       return facade[name]
# if we are here, load the real object and get the attribute from there       

What I'm not quite understand is why the rest of the functions are fleshed out (e.g. __bool__, __iter__ and so on) because that would mean I have to put in the same logic into each of these methods?

scopatz commented 8 years ago

All those functions are fleshed out because Python doens't necessarily go through __getattribute__ for special methods. It would take actually writing this and testing it to see if it would work for __class__ etc.

scopatz commented 8 years ago

My guess is that it would, though. I don't think all of those functions need to be changed. And if they do, there is probably a way to abstract that in _lazy_obj()

jankatins commented 8 years ago

Another question: is it ok to switch this project to py2+ instead of py3 only?

scopatz commented 8 years ago

Yep, this can be py2k friendly as long as it also works well with amalgamate.

scopatz commented 8 years ago

One of the reasons for splitting this and amalgamate out from xonsh was to make the more broadly available, including other versions of Python