simphony / simphony-osp

A framework that aims to achieve interoperability between software such as simulation engines, databases and data repositories using a knowledge graph as the common language.
https://simphony.readthedocs.io
Other
16 stars 12 forks source link

638 Strange behaviour of CUDS remove #659

Closed kysrpex closed 3 years ago

kysrpex commented 3 years ago

Not putting the wrapper in the added buffer solves the problem.

kysrpex commented 3 years ago

I had a closer look at the buffer mechanism, and I would say that indeed as I believe it makes no sense. Take a look at this file. It contains the only example in the OSP-core code of what being in each buffer actually means. I am looking at the version before I introduced the buffer performance updates to make sure that the bug is there as well, as this bug is 20-25 days old.

The get_triples method is also relevant.

    def get_triples(self, include_neighbor_types=False):
        """Get the triples of the cuds object."""
        o_set = set()
        for s, p, o in self._graph.triples((self.iri, None, None)):
            yield s, p, o
            o_set.add(o)
        if include_neighbor_types:
            for o in o_set:
                yield from self._graph.triples((o, RDF.type, None))

Added buffer (link)

    def _apply_added(self, root_obj, buffer):
        # Perform the SQL-Statements to add the elements
        # in the buffers to the DB.

        for added in buffer.values():
            triples = self._substitute_root_iri(added.get_triples())
            self._add(*triples)

In plain English, for every CUDS object in the added buffer, get all the triples (from memory, the CUDS only exists in memory) that have the CUDS IRI as subject. Put them in the backend.

In even simpler English: copy the CUDS from the memory to the backend.

Updated buffer (link)

    def _apply_updated(self, root_obj, buffer):
        # Perform the SQL-Statements to update the elements
        # in the buffers in the DB.
        for updated in buffer.values():
            pattern = (updated.iri, None, None)
            self._remove(next(self._substitute_root_iri([pattern])))
            triples = self._substitute_root_iri(updated.get_triples())
            self._add(*triples)

In plain English, for every CUDS object in the added buffer, remove all triples that have the CUDS IRI as subject from the backend. Then get all the triples (from memory, the CUDS only exists in memory) that have the CUDS iri as subject. Put them in the backend.

In even simpler English: remove the CUDS from the backend. Copy the CUDS from the memory to the backend.

Deleted buffer (link)

    def _apply_deleted(self, root_obj, buffer):
        # Perform the SQL-Statements to delete the elements
        # in the buffers in the DB.
        for deleted in buffer.values():
            pattern = (deleted.iri, None, None)
            self._remove(next(self._substitute_root_iri([pattern])))

In plain English, for every CUDS object in the added buffer, remove all triples that have the CUDS IRI as subject from the backend.

In even simpler English: remove the CUDS from the backend.


So, guess what happens if you put the wrapper in the added buffer? All its triples get added to the backend, while the old ones are NOT removed. Ideally, one would "load the wrapper from backend" and return the loaded wrapper instead of a new one, but there are several checks in place to prevent this and the solution is more complex.

Simply never putting wrappers in the added buffer seems to be good enough, since they are always supposed to be the root of the session, so their existence only makes sense when they are attached to something. And when they are attached to something, they go to the updated buffer, where what happens actually makes sense according to the code above.

paulzierep commented 3 years ago

I can confirm that the problem persists.

with DataspaceSession(conn_text, connect_kwargs=connect_kwargs, dataspace_name=PRIVATE_DATASPACE_DB_NAME) as session:
    wrapper = cuba.Wrapper(session=session)
    wrapper.add(cuba.Entity(), rel=cuba.activeRelationship)
    session.commit()

with DataspaceSession(conn_text, connect_kwargs=connect_kwargs, dataspace_name=PRIVATE_DATASPACE_DB_NAME) as session:
        wrapper = cuba.Wrapper(session=session)

        for cuds in wrapper.iter():
            wrapper.remove(cuds)

        pretty_print(wrapper)
        session.commit()

>>>>>
- Cuds object:
  uid: 2ab207e6-e0a3-4fb3-abfa-0c73694fa0b7
  type: cuba.Wrapper
  superclasses: cuba.Entity, cuba.Wrapper
  description: 
    The root of all wrappers. These are the bridge to simulation engines and databases.
<<<<<

with DataspaceSession(conn_text, connect_kwargs=connect_kwargs, dataspace_name=PRIVATE_DATASPACE_DB_NAME) as session:
    wrapper = cuba.Wrapper(session=session)

    for cuds in wrapper.iter():
        wrapper.remove(cuds)

    pretty_print(wrapper)

>>>>>

- Cuds object:
  uid: 9c6dc4bc-85c4-409c-94f0-890ae00de568
  type: cuba.Wrapper
  superclasses: cuba.Entity, cuba.Wrapper
  description: 
    The root of all wrappers. These are the bridge to simulation engines and databases.

   |_Relationship cuba.activeRelationship:
     -  cuba.Entity cuds object:
        uid: a53536c1-8a03-4afa-83b2-438e8f654718
<<<<<
kysrpex commented 3 years ago

Can you try with a different PRIVATE_DATASPACE_DB_NAME that has been never used?

kysrpex commented 3 years ago

As you said that it also fails locally, I am leaving this here for tomorrow.


In [10]: from osp.wrappers.sqlite import SqliteSession as SqliteWrapperSession
    ...: from osp.core.namespaces import cuba
    ...: import os
    ...: 
    ...: try:
    ...:     os.remove('A.db')
    ...: except FileNotFoundError:
    ...:     pass
    ...: 

    ...: except FileNotFoundError:
    ...:     pass
    ...: 
    ...: with SqliteWrapperSession("A.db") as session_A:
    ...:     wrapper = cuba.Wrapper(session=session_A)
    ...: 
    ...:     wrapper.add(cuba.Entity(), rel=cuba.activeRelationship)
    ...: 
    ...:     session_A.commit()
    ...: 
    ...: with SqliteWrapperSession("A.db") as session_A:
    ...:     wrapper = cuba.Wrapper(session=session_A)
    ...: 
    ...:     for cuds in wrapper.iter():
    ...:         wrapper.remove(cuds)
    ...:         #wrapper.session.delete_cuds_object(cuds)
    ...: 
    ...:     session_A.commit()
    ...: 
    ...: with SqliteWrapperSession("A.db") as session_A:
    ...:     wrapper = cuba.Wrapper(session=session_A)
    ...: 
    ...:     pretty_print(wrapper)
    ...: 
WARNING 2021-06-17 18:42:10,086 [osp.core.session.db.db_wrapper_session]: Some CUDS objects are unreachable from the wrapper object: cuba.Entity: fd31a7e0-aa65-46f8-8d1a-53ce64d6660c.
If you want to be able to retrieve those CUDS objects later, either add them to the wrapper object or to any other CUDS that is reachable from it.
- Cuds object:
  uid: 307c119e-ea68-424c-bab8-6d7f766a209d
  type: cuba.Wrapper
  superclasses: cuba.Entity, cuba.Wrapper
  description: 
    The root of all wrappers. These are the bridge to simulation engines and databases.

In [12]: from osp.wrappers.sqlite import SqliteSession as SqliteWrapperSession
    ...: from osp.core.namespaces import cuba
    ...: from osp.core.utils import pretty_print
    ...: import os
    ...: 
    ...: try:
    ...:     os.remove('A.db')
    ...: except FileNotFoundError:
    ...:     pass
    ...: 
    ...: with SqliteWrapperSession("A.db") as session_A:
    ...:     wrapper = cuba.Wrapper(session=session_A)
    ...: 
    ...:     wrapper.add(cuba.Entity(), rel=cuba.activeRelationship)
    ...: 
    ...:     session_A.commit()
    ...: 
    ...: with SqliteWrapperSession("A.db") as session_A:
    ...:     wrapper = cuba.Wrapper(session=session_A)
    ...: 
    ...:     for cuds in wrapper.iter():
    ...:         #wrapper.remove(cuds)
    ...:         wrapper.session.delete_cuds_object(cuds)
    ...: 
    ...:     session_A.commit()
    ...: 
    ...: with SqliteWrapperSession("A.db") as session_A:
    ...:     wrapper = cuba.Wrapper(session=session_A)
    ...: 
    ...:     pretty_print(wrapper)
    ...: 
- Cuds object:
  uid: 78f001e8-96dd-4c91-9094-bd73482a6277
  type: cuba.Wrapper
  superclasses: cuba.Entity, cuba.Wrapper
  description: 
    The root of all wrappers. These are the bridge to simulation engines and databases.

In [13]: from osp.wrappers.sqlite import SqliteSession as SqliteWrapperSession
    ...: from osp.core.namespaces import cuba
    ...: from osp.core.utils import pretty_print
    ...: import os
    ...: 
    ...: try:
    ...:     os.remove('A.db')
    ...: except FileNotFoundError:
    ...:     pass
    ...: 
    ...: with SqliteWrapperSession("A.db") as session_A:
    ...:     wrapper = cuba.Wrapper(session=session_A)
    ...: 
    ...:     wrapper.add(cuba.Entity(), rel=cuba.activeRelationship)
    ...: 
    ...:     session_A.commit()
    ...: 
    ...: with SqliteWrapperSession("A.db") as session_A:
    ...:     wrapper = cuba.Wrapper(session=session_A)
    ...: 
    ...:     for cuds in wrapper.iter():
    ...:         wrapper.remove(cuds)
    ...:         wrapper.session.delete_cuds_object(cuds)
    ...: 
    ...:     session_A.commit()
    ...: 
    ...: with SqliteWrapperSession("A.db") as session_A:
    ...:     wrapper = cuba.Wrapper(session=session_A)
    ...: 
    ...:     pretty_print(wrapper)
    ...: 
- Cuds object:
  uid: 99eb3c78-a06e-4f48-a069-20e209fcc0e2
  type: cuba.Wrapper
  superclasses: cuba.Entity, cuba.Wrapper
  description: 
    The root of all wrappers. These are the bridge to simulation engines and databases.

Source for copy and paste:

from osp.wrappers.sqlite import SqliteSession as SqliteWrapperSession
from osp.core.namespaces import cuba
from osp.core.utils import pretty_print
import os

try:
    os.remove('A.db')
except FileNotFoundError:
    pass

with SqliteWrapperSession("A.db") as session_A:
    wrapper = cuba.Wrapper(session=session_A)

    wrapper.add(cuba.Entity(), rel=cuba.activeRelationship)

    session_A.commit()

with SqliteWrapperSession("A.db") as session_A:
    wrapper = cuba.Wrapper(session=session_A)

    for cuds in wrapper.iter():
        wrapper.remove(cuds)
        wrapper.session.delete_cuds_object(cuds)

    session_A.commit()

with SqliteWrapperSession("A.db") as session_A:
    wrapper = cuba.Wrapper(session=session_A)

    pretty_print(wrapper)
paulzierep commented 3 years ago

I can confirm, that it works for the SqliteWrapperSession. I think I did not load the branch correctly yesterday. I will test it with the DataSpaceSession asap.

paulzierep commented 3 years ago

Unfortunately, it does break something when using the DataspaceSession:

with DataspaceSession(conn_text, connect_kwargs=connect_kwargs, dataspace_name=PRIVATE_DATASPACE_DB_NAME) as session_A:
    wrapper = cuba.Wrapper(session=session_A)
    wrapper.add(cuba.Entity(), rel=cuba.activeRelationship)
    session_A.commit()

with DataspaceSession(conn_text, connect_kwargs=connect_kwargs, dataspace_name=PRIVATE_DATASPACE_DB_NAME) as session_A:
    wrapper = cuba.Wrapper(session=session_A)
    for cuds in wrapper.iter():
        wrapper.remove(cuds)
        wrapper.session.delete_cuds_object(cuds)

    session_A.commit()

with DataspaceSession(conn_text, connect_kwargs=connect_kwargs, dataspace_name=PRIVATE_DATASPACE_DB_NAME) as session_A:
    wrapper = cuba.Wrapper(session=session_A)
    pretty_print(wrapper)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-154a42e075a2> in <module>()
     19 with DataspaceSession(conn_text, connect_kwargs=connect_kwargs, dataspace_name=PRIVATE_DATASPACE_DB_NAME) as session_A:
     20     wrapper = cuba.Wrapper(session=session_A)
---> 21     pretty_print(wrapper)

/home/paul/anaconda3/envs/simphony/lib/python3.6/site-packages/osp/core/utils/pretty_print.py in pretty_print(cuds_object, file)
     25         pp += "\n  values: " + _pp_values(cuds_object)
     26     pp += "\n  description: \n    %s\n" % cuds_object.oclass.description
---> 27     pp += _pp_subelements(cuds_object)
     28 
     29     print(pp, file=file)

/home/paul/anaconda3/envs/simphony/lib/python3.6/site-packages/osp/core/utils/pretty_print.py in _pp_subelements(cuds_object, level_indentation, visited)
     72         sorted_elements = sorted(
     73             cuds_object.iter(rel=relationship, return_rel=True),
---> 74             key=lambda x: (str(x[0].oclass), str(x[1]),
     75                            x[0].name if hasattr(x[0], "name") else False)
     76         )

/home/paul/anaconda3/envs/simphony/lib/python3.6/site-packages/osp/core/cuds.py in iter(self, rel, oclass, return_rel, *uids)
    404                 yield r
    405             else:
--> 406                 yield from ((r, m) for m in mapping[r.uid])
    407 
    408     def _recursive_store(self, new_cuds_object, old_cuds_object=None):

AttributeError: 'NoneType' object has no attribute 'uid'
kysrpex commented 3 years ago

Unfortunately, it does break something when using the DataspaceSession:

with DataspaceSession(conn_text, connect_kwargs=connect_kwargs, dataspace_name=PRIVATE_DATASPACE_DB_NAME) as session_A:
    wrapper = cuba.Wrapper(session=session_A)
    wrapper.add(cuba.Entity(), rel=cuba.activeRelationship)
    session_A.commit()

with DataspaceSession(conn_text, connect_kwargs=connect_kwargs, dataspace_name=PRIVATE_DATASPACE_DB_NAME) as session_A:
    wrapper = cuba.Wrapper(session=session_A)
    for cuds in wrapper.iter():
        wrapper.remove(cuds)
        wrapper.session.delete_cuds_object(cuds)

    session_A.commit()

with DataspaceSession(conn_text, connect_kwargs=connect_kwargs, dataspace_name=PRIVATE_DATASPACE_DB_NAME) as session_A:
    wrapper = cuba.Wrapper(session=session_A)
    pretty_print(wrapper)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-154a42e075a2> in <module>()
     19 with DataspaceSession(conn_text, connect_kwargs=connect_kwargs, dataspace_name=PRIVATE_DATASPACE_DB_NAME) as session_A:
     20     wrapper = cuba.Wrapper(session=session_A)
---> 21     pretty_print(wrapper)

/home/paul/anaconda3/envs/simphony/lib/python3.6/site-packages/osp/core/utils/pretty_print.py in pretty_print(cuds_object, file)
     25         pp += "\n  values: " + _pp_values(cuds_object)
     26     pp += "\n  description: \n    %s\n" % cuds_object.oclass.description
---> 27     pp += _pp_subelements(cuds_object)
     28 
     29     print(pp, file=file)

/home/paul/anaconda3/envs/simphony/lib/python3.6/site-packages/osp/core/utils/pretty_print.py in _pp_subelements(cuds_object, level_indentation, visited)
     72         sorted_elements = sorted(
     73             cuds_object.iter(rel=relationship, return_rel=True),
---> 74             key=lambda x: (str(x[0].oclass), str(x[1]),
     75                            x[0].name if hasattr(x[0], "name") else False)
     76         )

/home/paul/anaconda3/envs/simphony/lib/python3.6/site-packages/osp/core/cuds.py in iter(self, rel, oclass, return_rel, *uids)
    404                 yield r
    405             else:
--> 406                 yield from ((r, m) for m in mapping[r.uid])
    407 
    408     def _recursive_store(self, new_cuds_object, old_cuds_object=None):

AttributeError: 'NoneType' object has no attribute 'uid'

Just a comment for @yoavnash 's review: this does not break any "additional" thing. It just does not fix the problem, this error seems to be the same that appeared on the issue.