Python bindings and mutation patterns

cmyr commented 3 years ago

This is a quick sketch of what would be involved in adapting norad so that it would be useable from languages like python, as a drop-in replacement for existing libraries; the particular challenge here is getting norad objects to have the same semantics ('reference semantics') as objects in those languages, and making that work with Rust's ownership model.

This is follow-up from the discussion in https://github.com/fonttools/fonttools/issues/1095.

I am going to make a bunch of assumptions about how some of the python tools work, so please correct me wherever I'm wrong!

My understanding of what we would want in a python API is basically: everything is an 'object', which reference semantics. If I get a glyph from a layer, and I change its outline, and then I go and get another copy of that glyph from the layer, those glyphs will be identical; they point to the same underlying data.

This is not how things currently work in norad; in norad mutation works through one of two mechanisms, which i'll call "borrowing" and "check-out/check-in":

borrowing: This is how layers currently work in norad. To mutate a layer, you call a method that returns a reference to a layer object, and you can mutate this layer; however you cannot hold on to a copy of this layer, or really pass it anywhere else.
check-in/check-out: This how glyphs currently work: norad doesn't even let you get a mutable reference to a glyph, at all. In norad, if you want to mutate a glyph, you get a glyph from a layer, you modify it, and then you add it back to that layer. You can think of this as being like a check-out/check-in mechanism.

A design to support bindings

I think that trying to make norad as it is currently written fit into the python model will be tricky, but I think there's a reasonably straight-forward answer, which is that we have a separate set of types and interfaces explicitly designed to work with python. This should let us continue to share all of the base types and parsing/validation/serialization logic, while letting us build two separate APIs that will respect the two distinct use-cases.

So basically: we add a python-specific wrapper, in rust, for each type, like:

pub struct PyUfo {
    meta: PyMetaInfo,
    font_info: PyFontInfo,
   ...etc
}

pub struct PyLayer {
    glyphs: Rc<RefCell<Map<String, PyGlyph>>,
    ...etc
}

pub struct PyGlyph {
    inner: Rc<RefCell<Glyph>>
}

etcetera.

Note: This assumes that the glyph is a 'leaf' type, that is it is the finest granularity object that you're allowed to mutate and expect those mutations to show up elsewhere. This might not be the case; for instance you might expect to be able to get a Contour out of a glyph and change its properties and have those be reflected everywhere; in this case we could also need a PyContour type, and PyGlyph would look more like the Glyph that's already in norad.

You can mostly ignore the Rc<RefCell<_>> bit. The Rc means is that we're using a Reference counted pointer, and the RefCell means basically that the internal data is not subject to rust's borrowing rules at compile-time.

(Rc + RefCell is assuming that this object will not be shared between OS threads, which seems like a reasonable assumption for python; if we do want that behaviour then we would instead use Arc + Mutex, which ensures that our reference counts and data access are thread-safe).

Borrowing problems

One possible concern with this approach involves borrowing expectations; the rule with RefCell is that when you actually want to mutate the data, you acquire a kind of 'lock'. If this object is already borrowed, you can't get that lock. In practice I think we can avoid this completely by ensuring all of that acquire/release happens on the rust side; I'd have to look into this a bit, though, to make sure. It might mean we have to write something to generate the python bindings ourselves, to ensure that things like setters and getters are doing that borrowing under the covers.

If we do have to expose this somehow, what we would do is to just throw a python exception if something was already borrowed. I was initially thinking this would be a larger part of the design, as a sort of safety valve; when folks migrated existing python code to this library they might hit some new exceptions, but I actually think we can probably avoid this altogether?

other thoughts

an alternative design based on proxy objects: I think if we want a drop-in replacement for an existing tool written in python, something like what I describe here will be the best route. There are options, like having 'proxy objects' that just hold a reference to the font or layer as well as a method for mapping mutations on themselves to mutations on the shared object. This honestly has a certain nerdy appeal, especially since we could do cool stuff like having a def delete: on a Glyph object that removes it from the layer and updates the layer_contents, but I think it's probably a bit more complicated and it's a bit less clear to me how well it would work, although I'm more curious as I end this paragraph than I was when I began?

next steps

This is intended as a sketch, and an actual design will require a bit more thought and research. I'm going to hold off on doing that work until I have a better sense of how much of a priority is this, and whether it's my priority or someone else's. If @simoncozens is interested in doing the work then I'm happy to offer whatever advice and guidance I can. Otherwise if @davelab6 thinks that this is worth a week or two of my time then I'm confident we can get something working pretty quickly; the only part I'm unsure of is how to generate the python bindings in a way that would play nicely with with this interior-mutability pattern.

simoncozens commented 3 years ago

Another suggestion that Raph had was the only object you expose to Python is the font, and you pass around with a path-key to access or mutate deeper structures. So changing the X position of a point is actually done by the moral equivalent of font.set_value(“public.glyphs/a/2/1/x”, -5)

That may help to put all the locking in the same place.

cmyr commented 3 years ago

@simoncozens that sounds like approximately what I was thinking about with 'proxy objects', as an alternative design, although I probably could have expressed it more clearly. :)

simoncozens commented 3 years ago

Incidentally we've since discovered that UFO loading is not the bottleneck we thought it was (yay profiling!) so I would not suggest this was a very high priority... What I have with iondrive (creating ufoLib2 objects in Rust) is fast enough for my needs.

cmyr commented 3 years ago

Incidentally we've since discovered that UFO loading is not the bottleneck we thought it was (yay profiling!) so I would not suggest this was a very high priority... What I have with iondrive (creating ufoLib2 objects in Rust) is fast enough for my needs.

okay, sounds good!

linebender / norad