neo4j-contrib / neomodel

An Object Graph Mapper (OGM) for the Neo4j graph database.
https://neomodel.readthedocs.io
MIT License
963 stars 232 forks source link

Upgrading to version 3.3.0 causes a KeyError on initialising connection to neo4j DB #378

Closed jonadaly closed 5 years ago

jonadaly commented 5 years ago

We have a neo4j instance, populated with the help of neomodel v3.2.9. The v3.3.0 of neomodel breaks when attempting to initialise the connection to the database:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/neomodel/util.py", line 150, in _object_resolution
    resolved_object = self._NODE_CLASS_REGISTRY[frozenset(a_result_attribute[1].labels)].inflate(
KeyError: frozenset({'Person'})

This appears to be as a result of the self._NODE_CLASS_REGISTRY changes included in v3.3.0, which appear to be breaking. Is this a known issue? Given that this is a minor version bump, we would not expect breaking changes to be included.

phoebebright commented 5 years ago

I'm not sure - I don't understand how django/wsgi/neo4j is working at that level unfortunately!

aanastasiou commented 5 years ago

@robertlagrant Is that as a standalone or through Django? Can you talk a little bit more about the problem?

aanastasiou commented 5 years ago

@robertlagrant Thank you very much for this note, can I please ask you to create a new "issue" for that if any additional work is required? Once you have done this, please erase this post to try and keep things organised.

As mentioned previously, 3.3.0 is not only about the generalisation fix.

robertlagrant commented 5 years ago

@aanastasiou sure. I'm not sure about extra work, but just mentioning that anyone using neomodel against a clustered neo4j might currently be having a very bad time, as they potentially can't use 3.2.9 or 3.3.0 :-)

I'm happy to have a look at the neomodel code, if that'd help speed things up. Could you expand on the reasons you mentioned above for why the model registry and connection info should be thread-local?

aanastasiou commented 5 years ago

@robertlagrant Then it sounds that a new issue should be created. Can you please do this and remove the previous comment to keep this thread focused on 378? (EDIT: It would be interesting to see a description of the "cluster problem" because I have used neomodel in a clustered environment without problems but that was around the time neo4j 3.4.7 or 3.4.8 was around and I was only doing parallel reads. So, as you can see, the devil is in the details which I would be very interested in finding out more about.)

In brief, the situation (about issue 378) is described further below but also, if you scroll back in this discussion you will find additional details about specific issues that I might omit in the following general description:

  1. All the calls from anywhere within an application that uses neomodel are routed via one Database object that is created during module instantiation.

  2. Part of the responsibilities of this object is transaction handling.

    1. A CRUD operation occurs within a transaction context. Transactions can be created / joined automatically or manually.
    2. Prior to executing a given CRUD query, a handle to a transaction is obtained. If a transaction is ongoing it is re-used, otherwise a new one is created.
    3. As per Neo4J's modes of operation, for the DBMS to be able to enforce constraints to a bunch of operations, they have to occur within the same "transaction".
    4. To enforce this mode of operation, the Database is "tied" to the process ID.
  3. Neomodel 3.3.0 introduced state to the Database object via a node-class registry that is used to resolve a neo4j Node object to a local datamodel object.

  4. Within the context of a single-process (and potentially even multi-threaded) application, state is preserved and everything works as expected.

  5. But, with WSGI firing a new process to serve requests, the neomodel.Database object that is instantiated during application creation is not the same object that is attempted to be accessed during the serving of a request.

    • Potential solutions:
      1. Structure the code in such a way that for every request, neomodel has to reload (i.e. re-import) all required data model classes for a particular request.
      2. Save and restore the Database attribute that is responsible for the object resolution
        • This has to be done in a web-framework specific way.
      3. In the long-term we are trying to provide two things:
        1. A way to ensure that neomodel has imported all the required classes BEFORE attempting a CRUD operation. This is already happening in an automatic way, all we would be doing would be to provide a way to specify which classes must be loaded along with module instantiation.
        2. Web framework specific ways of providing database access via neomodel. For example, in the case of Flask, it is possible to provide application wide access to a specific object via an extension.
      4. But in order to come up with one definitive solution for the next release, we need to be clear on how these two issues interract. For example, it might be possible to simply provide a Flask extension that takes care of internals without major changes to neomodel.

This is where we are at the moment. If you would like to help, I would appreciate it if you provided web-framework specific views on how to deal with preserving state, especially for Django. At the moment I am trying to see how to do this with Flask and @mjmare has looked into Pyramid (but I expect that it has a similar mechanism).

There is a bit more work that has taken place in the background, you can see gists and repos further up but overall this is where we are at the moment.

All the best

aanastasiou commented 5 years ago

@jonadaly @manoadamro @theterg @mzgajner @mostafa @mjmare @MardanovTimur @phoebebright @robertlagrant @robinedwards

Here is a minimal Gist to show:

  1. The way neomodel.db state must be preserved between application creation and serving of the request

  2. The basic skeleton of what will later become something like a flask-neomodel-manager extension for flask. I expect that something similar would have to be created separately for Django, Pyramid and other frameworks if in the meantime we have not come up with a better alternative.

All the best

jberends commented 5 years ago

@aanastasiou AFAIK in Django you have a DatabaseWrapper object that is thread specific that manages the connection to the database. Each new thread (in development each request is handled in a new thread) or process spawned by some threading/process manager will recreate that DatabaseWrapper with a ConnectionManager inside.

The schema definition is the Django Model definition itself (as created by the user and ensured consistency with the schema in the database by schema migrations once during server startup) and a ModelManager will use the DatabaseWrapper to (eventually lazy just-in-time) create a (new) connection to the database and fire queries using a Cursor. I.o.w. Schema and Connection are disconnected from each other and not dependent. Once the main process is instantiated is assumes a consistent database against the modelschema.

One may choose to re-use the connection, but that is most of the time not the case (never in development, optional in production) to reduce the number of simultaneous connections 'open' to the database. Better to open, do transaction, close. Is neomodel assuming a single open connection?

I am curious to see how neomodel manages this as we experience the same problems with 3.3.0 as anyone else. Will make some effort to do testing.

aanastasiou commented 5 years ago

@jberends

AFAIK in Django you have a DatabaseWrapper object that is thread specific that manages the connection to the database....

My question was specifically about how Django's middleware uses neomodel. However, from what I see here, this is not very far off from the way neomodel is set up as well. The only problem is maintaining the state of the neomodel.db object (more about this further below).

Is neomodel assuming a single open connection?

I am assuming you mean one connection that needs to remain connected to the server. If that is the case, then no.

I am curious to see how neomodel manages this as we experience the same problems with 3.3.0 as anyone else. Will make some effort to do testing.

Please see this gist which is also linked in my last post. This is a very crude outline of how the problem is dealt with in Flask and it shows how to "save" the state between calls.

I have not looked any further details into this but I am fairly confident that applications can be isolated from each other so that it is possible to run different applications on the same server, all using neomodel but over different connections and schemata.

As mentioned above, I think that this is going to have to be solved on a "per-framework" basis. Django has its own middleware infrastructure so does Pyramid and other frameworks.

I think that it would help if there was a repository that demonstrated how Django models integrate with neomodel (which provides its own models. Do you inherit from both? Do you build Django wrappers around neomodel classes to "force" it to talk to the database via neomodel?). We could then use that one to discuss ways forward. If you have experience with Django middleware and would like to create one for neomodel, even better.

All the best

robertlagrant commented 5 years ago

@aanastasiou two stupid questions: 1) we use waitress, which doesn't make new processes, so why would we be affected? 2) with WSGI servers that do create a new process, don't they either a) have to reinitialize everything from scratch, including the imports, or b) already have the memory initialised? (or c) you don't understand, Rob, and here's what you're missing)

aanastasiou commented 5 years ago

@robertlagrant Thanks for your comment. There are no stupid questions. I do not have a problem with accepting any misunderstanding on my part and I would be glad to get any deeper insight by anyone who would like to offer it. (Rob?)

  1. I have not used waitress. Do you think you can provide a waitress based counterexample of this situation where the neomodel.db object's state is preserved but this still leads to problems?

  2. a) Yes they do, you might want to see this and this on how this fails. I appreciate that this thread is getting too long by now but it does contain quite a few details on the problem.

jberends commented 5 years ago

@aanastasiou I have created a scaffold project for django and neomodel which is able to re-create the conditions between 3.2.9 and 3.3.0.

The repo can be found here: https://github.com/jberends/django_neo_3.3.0 Guide to install can be found here: https://github.com/jberends/django_neo_3.3.0/blob/master/README.md

Indeed it seems that the _NODE_CLASS_REGISTRY is not populated inside the request stream. I will add some notes in the readme to indicate where the neomodel specific configuration is located.

I use Pycharm and I included a 'run' command that may be used to run/debug the server once loaded.

What I found out is:

update 1445 - Found a possible solution

I am sure it is the design of the 'singleton' of the Database object. The object definition of the Database class is subclassed from threading.local(py2, py3). Which states:

class threading.local¶

A class that represents thread-local data. Thread-local data are data whose values are thread specific. To manage thread-local data, just create an instance of local (or a subclass) and store attributes on it: [...] The instance’s values will be different for separate threads.

Once I subclass from object, it works. And while the Database object only contains 'glue' information and does not hold data, it could be a solution in a multi threaded environment (also multi process as each process itself re-inits the django world on its own and have their own process-local Database object with correct parameters.)

A patch would look like:

--- neomodel/util.py
+++ neomodel/util.py
@@ -42,7 +42,7 @@
     db.cypher_query("MATCH (a) DETACH DELETE a")

-class Database(local):
+class Database(object):
     def __init__(self):
         self._active_transaction = None
         self.url = None
aanastasiou commented 5 years ago

@jberends Thank you very much for this, the respository is very useful anyway. Regarding the rest of it, it is as you describe it. The Database derives from local and the problem emerges once there is instantiation from different processes.

The bit that is going to be complicated then is this one.

In other words, multi-threading application transactions are pushed under the same "unit of work" but across processes a new connection is created.

This change would mean that even multi-thread calls would create a new connection. Does anyone see any problems with that (?)

Edit: @jberends Just to be clear about something: When people have been talking about integration with Django so far, I assumed that it was about getting Django models to work together with neomodel. That is, having a Django User stored via neomodel, The repository you created seems to be using neomodel separately, to simply access a Graph database.

jberends commented 5 years ago

@jberends Thank you very much for this, the respository is very useful anyway. Regarding the rest of it, it is as you describe it. The Database derives from local and the problem emerges once there is instantiation from different processes.

Even 'threads' re-instantiate the Database object, not only processes.

The bit that is going to be complicated then is this one. In other words, multi-threading application transactions are pushed under the same "unit of work" but across processes a new connection is created. This change would mean that even multi-thread calls would create a new connection. Does anyone see any problems with that (?)

This might be tricky indeed. If I understand correctly:

If this is the case, one would think to separate the concept of storing the 'Model-metadata from the database' and a Connection to the database (which normally can even open and is closed each transaction (eg for postgres) to limit amount of connections to the database). If I see correctly the driver.session is the connection object that is used to execute cypher.

So maybe the Database object can be subclassed from threading.local hower the _NODE_CLASS_REGISTRY should live in a proper-proper singleton.

Conclusion: Factor out the NODE_CLASS_REGISTRY (in short: NCR) into its own separate singleton object, and do not change the rest of the Database object. The Database object can then access the NCR itself in 'read' mode. Other functions that populate the NCR should then populate that singleton, through its interface.

Edit: @jberends Just to be clear about something: When people have been talking about integration with Django so far, I assumed that it was about getting Django models to work together with neomodel. That is, having a Django User stored via neomodel, The repository you created seems to be using neomodel separately, to simply access a Graph database.

Sure, I get that. However neomodel does already provide a nice way to define the model and deal with NodeSet results. A more integrated solution with more 'django-feel' would help traction of adoption of neo4j in django-land IMHO, but that is more for django-neomodel to pick up on.

Update 17:00

I made a small POC implementation on https://github.com/KE-works/neomodel/tree/feature_class_registry_singleton and that works, except 2 tests that alter the registry directly. I did not have time to fix those. This is an ugly POC using globals to provide a dict. A class implementation may be wiser.

Also credits to my collegue @oikonomou with which I had an extensive discussion on the topic. :handshake:

aanastasiou commented 5 years ago

@jberends

If this is the case, one would think to separate the concept of storing the Model-metadata from the database' and a Connection to the database (which normally can even open and is closed each transaction (eg for postgres) to limit amount of connections to the database). If I see correctly the driver.session is the connection object that is used to execute cypher.

Well, a bit more work has taken place around this and a "batched" operations feature (but if possible via multiple units of work and batches of units of work). I have left it as a long term TODO because I was trying to solve it via Database(). The current form is convenient because all Node/Rel calls have to go through the Database independently of where they "live". When you create a StructuredNode, you don't have to specify that its calls should go via a specific Database. If you start creating multiple Database objects, you are going to have to have a way of specifying that you want a particular StructuredNode to route its calls through a particular Database object. This is one thing. The other thing is exposing information of one application from another if there is one massive registry devoted to all instances. It needs a little bit more thinking and this is why, personally, I have left it as a longer term TODO. So, thank you for this note, as my thinking was not going through meta-classes at all.

By the way, if in the meantime someone comes up with a meta-class based solution faster, please submit a pull request.

Factor out the NODE_CLASS_REGISTRY...

Alright, I will see what I can do, but at the moment, I cannot say that I will have this ready by XYZ deadline. If anyone else would like to go towards that direction, please go ahead.

Sure, I get that. However neomodel...

Thanks, I appreciate getting into the trouble of creating the repository and because of this I thought I'd clarify that the way that case fails is clear (to me) and it would not need further demonstration.

mostafa commented 5 years ago

I've tested the @oikonomou changes, but to no avail! The issue persists!

aanastasiou commented 5 years ago

@mostafa Thank you for testing this out. Less exclamation marks please. We are all pitching in to this as much as we can.

Can you please confirm that your test-case is not any different than the one described here?

aanastasiou commented 5 years ago

@robertlagrant A possible way forward if waitress simply does multiple threads (?). Please see the pull request (work not completed as far as I can tell) and discussion there.

jberends commented 5 years ago

I've tested the @oikonomou changes, but to no avail! The issue persists!

What is your setup? Do you use django? Or something else? Which python version are you using?

We've been testing it in Python 3.6 with latest django 2.1 and it works, also on more evolved cases. Furthermore other testcases (in py27) passes.

aanastasiou commented 5 years ago

@jonadaly @manoadamro @theterg @mzgajner @mostafa @mjmare @MardanovTimur @phoebebright @robertlagrant @robinedwards @jberends @oikonomou

Can I please ask you guys to give it a go and test this branch?

It incorporates a number of pull requests the most important of which for this issue is the one by @oikonomou that so far seems to have solved this issue:

@mjmare, @robertlagrant talked about Pylons / Pyramid before so it would be nice to hear from you if it works there too, although, I do not see why it should not.

Looking forward to hearing from you

jberends commented 5 years ago

@aanastasiou I can confirm that the branch you provided works without flaws.

aanastasiou commented 5 years ago

@jberends Thanks for taking the time to perform any additional tests, just bear with me as I am going through a number of other contributions at the moment prior to making the next release please.

jonadaly commented 5 years ago

@aanastasiou The changes appears to work:

Subsequent switching between the versions has the behaviour you would expect (3.2.9 and the above branch work; 3.3.0 results in the node registry problems).

aanastasiou commented 5 years ago

@jonadaly Thank you for taking the time to do this.

robertlagrant commented 5 years ago

We've had it running in dev for a couple of weeks and it seems stable to us.

jonadaly commented 5 years ago

@aanastasiou Given that this seems to be working, when can we expect to see a release with this fix included?

(Appreciate the hard work from all parties investigating and fixing - it's just this has been an issue for a while now so would be good to get it resolved ASAP!)

aanastasiou commented 5 years ago

Hello everyone, the release was made last Friday. We are now on 3.3.1. I am sorry I did not make a special announcement.

I would also like to thank everyone who took the time to look into any aspect of this issue and give us feedback or suggestions.

All the best

jberends commented 5 years ago

@aanastasiou

Did you write any changelog or other releasenotes? Next time I can help with that if you like.

aanastasiou commented 5 years ago

@jberends Yes, I did:

Version 3.3.1 2019-02-08
 * Fixed a number of warnings due to deprecations both within neomodel and pytest and overall improvements in
   code style (#381) - Abhishek Modi
 * Added support for spatial data through neomodel.contrib.spatial_datatypes (#384) - Athanasios Anastasiou
 * Added the ability to filter "left-hand statements" (in NodeSet expressions) too (#395) - Grzegorz Grzywacz
 * Refactor the Node Class Registry to make util.Database a true Singleton (#403) - Giorgos Oikonomou
     * Many thanks to Giorgos Oikonomou, Jon Daly, Adam Romano, Andrew Tergis, Mato Žgajner, Mostafa Moradian, mjmare,
       Phoebe Bright, Robert Grant, jberends and anyone else who helped flag, track and correct this bug. For more
       information, please see: https://github.com/neo4j-contrib/neomodel/issues/378
     * Added the ClassAlreadyDefined exception to prevent against accidental redefinitions of data model classes (#409)
       - Athanasios Anastasiou
 * Added the ability to lazily `.nodes.get()` and `.nodes.all()` (#406) - Matan Hart
 * Fixed a bug in the assumed direction of relationships in _build_merge_query (#408) - MrAnde7son

Will make sure that these changes are reflected in the repository too.