python-effect / effect

effect isolation in Python, to facilitate more purely functional code
372 stars 16 forks source link

add an ERef object, like Haskell's IORef. #41

Closed radix closed 9 years ago

radix commented 9 years ago

It only exposes read and modify operations for now, and doesn't implement the cool STM-style in-place updates, but the interface allows for it.

glyph commented 9 years ago

I can see why you would want something like ModifyERef - which I would call SetAttribute. My understanding of the use-case goes something like this:

You have some code which transforms some structure into an Effect, or returns an Effect that expresses some intent around making an external service request. But sometimes you want to talk to an external service, and sometimes you want to manipulate some internal state. So it follows that while Python has a reasonably concise and efficient language around variable mutation, you'd still want a structure that could represent "variable mutation" as an Effect.

What I don't understand is what the point of ERef itself is. It's an individual cell object which carries a reasonably high runtime cost (especially in CPython). It's also a container for a single value, so it doesn't bridge nicely into code which has existing mutable structures in it.

If there were going to be a thing like ERef, it seems like it would be best to construct it something like effect = MutableObject(object_with_attributes).attribute("attribute_name").set(new_value); that would make it easy to bridge an existing mutable-object interface. Similarly, this conceptually bridges to effect = MutableList(list_object).append(value), effect = MutableDict(dict_object).set(key, value).

Perhaps my lack of understanding of ERef's design hinges upon the "cool STM-style in-place updates" thing, which makes no sense to me. STM seems like a thing the runtime has to do to get any benefit out of it (my understanding is that PyPy's STM would be happiest with a small number of naively defined un-shared structures).

radix commented 9 years ago

@glyph

To clarify the "cool STM-style in-place updates": the simplest way to implement strong consistency in updates is by adding an atomic check-and-set on a version number associated with the ERef (in Python, we'd likely implement this with a lock, but maybe PyPy could expose some lower-level primitives).

And this leads to "why ModifyERef and not SetAttribute" -- It's ModifyERef and not SetAttribute because it takes a callable instead of a new value. Implementing it this way means that you can implement those atomic updates by repeatedly calling the transformation function if the version number mismatches, until you are able to write to the expected version and it succeeds.

So, ERef in that scenario would also carry a version number along with it.

Your MutableObject idea may have some merit, though I think it encourages a proliferation of mutable variables instead of isolating them to just what you need. And my current use case, in Otter, matches up with ERef.

Also: I don't care about performance on CPython. :-)

Ideally, what you put in an ERef is something that can be updated efficiently, like a persistent data structure that is implemented as a (H)AMT (e.g. something from the pyrsistent module). So you don't need to worry about actually mutating sub-components of your mutable object -- you just replace the whole object entirely, and do it efficiently, with something like a pyrsistent.PSet (or PRecord or PVec or whatever).

glyph commented 9 years ago

To clarify the "cool STM-style in-place updates": the simplest way to implement strong consistency in updates is by adding an atomic check-and-set on a version number associated with the ERef (in Python, we'd likely implement this with a lock,

Do you mean "In CPython"?

but maybe PyPy could expose some lower-level primitives).

PyPy's primitives for STM are "add to transaction" and "run transaction"; you don't need to implement revertable cells or write barriers yourself. The whole point of its STM layer is that most application code doesn't need to know that parallelism is happening.

And this leads to "why ModifyERef and not SetAttribute" -- It's ModifyERef and not SetAttribute because it takes a callable instead of a new value. Implementing it this way means that you can implement those atomic updates by repeatedly calling the transformation function if the version number mismatches, until you are able to write to the expected version and it succeeds.

ModifyAttribute, then; I suppose I shouldn't have been talking about a separate "set" in the context of an atomic-ish test-and-set.

So, ERef in that scenario would also carry a version number along with it.

In what scenario? The "with a lock" scenario or the "special new pypy primitives" scenario? I browsed some docs on IORef and I didn't see anything related to version numbers or retries so I don't see how this would work.

It seems like this design is optimized for some parallel-universe future where ERef is somehow part of the PyPy runtime…

Your MutableObject idea may have some merit, though I think it encourages a proliferation of mutable variables instead of isolating them to just what you need.

It's Python. You already have plenty of mutable variables; they're everywhere. The stdlib is full of them, the runtime is full of them, they're not going away. My contention here is that to facilitate adoption of Effect where it makes sense, ease of integration with those attributes is more important than an API which makes it more tedious to do so because rewriting all of that code in a different style would be more pure.

And my current use case, in Otter, matches up with ERef.

It seems to line up just as well with a MutableAttribute though; just store the PSet directly on the instance where you're putting the ERef and then wrap a MutableReference around it.

Also: I don't care about performance on CPython. :-)

Fair enough. Otter is deployed on PyPy, I guess?

Ideally, what you put in an ERef is something that can be updated efficiently, like a persistent data structure that is implemented as a (H)AMT (e.g. something from the pyrsistent module). So you don't need to worry about actually mutating sub-components of your mutable object -- you just replace the whole object entirely, and do it efficiently, with something like a pyrsistent.PSet (or PRecord or PVec or whatever).

If you're going to say "I don't care about performance" you can't turn around and say "efficiently" ;-). pyrsistent is definitely not as efficient as just using a mutable built-in, especially in pypy, where mutable dicts are magical unicorns that have CPU-cache-optimized algorithms :). Even according to its own documentation, its performance ranges from "nearly on par" to "slower" than built-ins.

If you have a side-effect-ful cell though, using a plain set vs. using a PSet is an irrelevant implementation detail to an outside observer of your object; they still get an Effect which manipulates some internal state they shouldn't need to see.

radix commented 9 years ago

I had an in-person meeting with @glyph and @lvh (and also consulted @cyli and @manishtomar). I'll summarize some of the discussion points:

The most important point is that things that need to deal with "lots" of mutable variables should be wrapped in higher-level, application-specific intents. I've implemented a number of intents for Otter that are high level, like "Authenticate", "InvalidateAuthenticationToken", "ModifyGroupState", and so on. Another option is to use Func for ad-hoc effect-izing of legacy code, but usually defining a custom intent is the best option. I think this is a better option than having something like MutableObject/MutableDict/MutableList etc, since I think those would lead to too many effects, and perhaps even encourage more use of effect than necessary (not to mention that it could lead to an explosion of Mutable* classes).

We also discussed the STM stuff. The main point was: we could eventually implement this with low-level compare-and-set operation, in the same way that Java's AtomicReference works. This would even interoperate seamlessly with PyPy's STM. Doing a compare-and-set operation in a STM transaction would not be a conflict-generating operation, since by construction honors changes made by other threads. But this is super forward-looking and not really relevant now. Having an atomic compare-and-set is only necessary if you're modifying the same reference from multiple physical threads at the same time anyway. The current interface just gives you a simple way to derive new state from old state with a pure function. And we can add an modify_atomic that at least implements atomicity with a lock around the read/write until we get a proper compare-and-swap.

So basically I'm just going to rename ERef to Reference and improve the docstrings a bit.

cyli commented 9 years ago

As for the stated goal of changing the name and improving docstrings, LGTM! Not sure if you want to get more input.

radix commented 9 years ago

@cyli - I think we've sorted out everything so I'm gonna merge. Thanks a lot for the help!