neo4j-contrib / neomodel

An Object Graph Mapper (OGM) for the Neo4j graph database.
https://neomodel.readthedocs.io
MIT License
939 stars 231 forks source link

Upgrade `NeomodelPoint` to also work with Shapely 2.0 #718

Closed aanastasiou closed 1 year ago

aanastasiou commented 1 year ago

Hello

I have just finished addressing #717 which was an extremely interesting roller coaster ride. It has resulted in two implementations and before we proceed with one of them, I wanted to provide some background:

The key problem

The folks over at Shapely have changed the way their objects are instantiated across the Python / C divide in a way that affects the way inheritance works when the object is instantiated in Python.

For more information, please see here, here and here.

Very briefly, certain objects of geometrical models (point, line, polygon and others) must be immutable. The way these are constructed right now, return a fixed and immutable object in Python which cannot be further subclassed.

Neomodel's NeomodelPoint relied heavily on inheritance to offer a NeomodelPoint which would be used for data marshalling to and from Neo4j but also be interoperable with Shapely. That is, once you have a NeomodelPoint result set you could use it as an iterable of Points directly with any other function Shapely offers. No need for the intermediate step of conversion. We already have all the functionality from Shapely and now simply specialise it to allow the creation of specific types of points with the addition of the coordinate reference system (CRS).

The solution(s)

There are two solutions:

Insist on inheritance

Unfortunately, insisting on inheritance leads to a huge number of hacks.

For comparison:

The current implementation VS NeomodelPoint trying to inherit from Shapely 2.0's Point

Very briefly:

  1. NeomodelPoint is becoming a "renamed" Shapely Point whose internal structure (i.e. attributes) cannot change (see __slots__ assignment in __new__ for example).
  2. Its initialisation is now split across __new__ and __init__
  3. To overcome the fact that objects must be immutable, the _crs property becomes a dict that maps id(self) to the actual value of the _crs a given object requires.
    • If you do not catch this early, it leads to all sorts of "wonderful" ways by which things can go wrong because this "detail" affects the way __copy__ and __deepcopy__ work which are used repeatedly everywhere in Neomodel (e.g. in queries).

The implementation has passed all tests but I would not consider it reliable because you find yourself trying to enforce the way inheritanc should work.

Have NeomodelPoint act as an adaptor of Shapely's Point

Compared to insisting on inheritance, this version feels more straightforward, but it results in NeomodelPoint not being a Shapely Point any more.

This might introduce problems when strict typing is enforced or affect tests where the lineage of the object is taken into account.

This "Adapter" implementation can be found here

Very briefly:

  1. The class remains largely untouched, except for getting another attribute to store the underlying Shapely Point which provides the actual storage and functions.
  2. Two additional functions were added: __getattr__ to route messages that NeomodelPoint does not know how to respond to, to the underlying Shapely Point and __eq__ to handle comparisons (which before we were getting "for free").
  3. The error condition of trying to access "longitude" on a cartesian Point now raises TypeError instead of AttributeError. This change was essential for the properties to operate as expected (if a property raises AttributeError, this signals to Python that the property does not exist on this object which then triggers the call to __getattr__.

This implementation also passes all tests right now.

Given these two pathways, my personal preference would be to have NeomodelPoint act as an adapter of Shapely's Point.

If however in the future, the folks at Shapely decide to change their mind about the way Shapely objects are instantiated, I would happily change the implementation back to NeomodelPoint being a child of Shapely's Point.

aanastasiou commented 1 year ago

@mariusconjeaud We might have to add another build conf for Shapely >= 2.0

mariusconjeaud commented 1 year ago

@mariusconjeaud We might have to add another build conf for Shapely >= 2.0

Not sure what you mean by this ?

FYI, the tests pipeline failed here, but only because it didn't manage to connect to Aura in the Aura-specific tests, which sometimes happens since I use a test Aura instance which looks like it's not always on.

aanastasiou commented 1 year ago

@mariusconjeaud

I understand. The way the package stands right now, specifies Shapely < 2.0.

This PR supports both. It detects the version and establishes NeomodelPoint accordingly.

I did this because I did not want to outright deprecate Shapely < 2.0. Users now have both options. If the transition to Shapely 2.0 creates any problems, they can report them and we can fix them but they also get the option to work with earlier Shapely versions.