mspass-team / mspass

Massive Parallel Analysis System for Seismologists
https://mspass.org
BSD 3-Clause "New" or "Revised" License
30 stars 12 forks source link

isinstance failure referencing mspass Database class #451

Closed pavlis closed 1 year ago

pavlis commented 1 year ago

This is an issue that I am preserving as an issue for the record but is one already solved.

While working on revisions to our Database class I encountered a problem where a module using the Database (initially Undertaker.py but most recently normalize.py) threw mysterious TypeError exceptions. A problem seems to be limited to cases where the class holds a copy of Database as a "self" attribute. The Undertaker example, had this simple block of code in it's constructor:

from mspasspy.db.database import Database

...

    def __init__(self, dbin,
                 regular_data_collection="cemetery",
                 aborted_data_collection="abortions",
                 data_tag=None,
                 ):

...

        if isinstance(dbin,Database):
            self.db = dbin
            self.dbh = self.db.elog
        else:
            message += "arg0 must be a MsPASS Database class (mspasspy.db.Database)\n"
            message += "Type of content passed to constructor={}".format(str(type(dbin)))
            raise TypeError(message)

Mysteriously, that block of code with throw a type error and say the type was mspasspy.db.Database. That is strange since the logic used would normally make the shorter Database symbol an alias for mspasspy.db.Database.

The solution was to change the isinstance test to the following small variant:

        if isinstance(dbin,pymongo.database.Database):
            self.db = dbin
            self.dbh = self.db.elog
        else:
            message += "arg0 must be a MsPASS Database class (mspasspy.db.Database)\n"
            message += "Type of content passed to constructor={}".format(str(type(dbin)))
            raise TypeError(message)

Noting this works because of how python handles inheritance in an isinstance test. A base class test always resolves true even if the referenced symbol is a subclass. The mspass Database class is a subclass of pymongo.database.Database.

This and a related circular reference problem I am nearly certain are caused by a symbol ambiguity created by defining the mspass Database class to have the same name as the pymongo version. That is, there is this ambiguity:

from pymongo.database import Database

versus

from mspasspy.db.database import Database

While the definition of mspass Database class is this:

class Database(pymongo.database.Database):

It was probably a design error to reuse the name Database for our extension of pymongo's class of the same name. Changing the name now, however, is not realistic as Database is used in a large fraction of our python code base.

Unless proven otherwise I'm pretty certain I have the solution for this and @wangyinz can immediately close this issue. The solution is this:

  1. In normal usage where one simply fetches the database handle with the dbclient.get_database call this is not even an issue since there is no import ambiguity.
  2. Applications needing to use a conditional like if isinstance(db,Database) will not face an issue in a standard script that would simply have only the line from mspasspy.db.database import Database at the top of the script or module.
  3. The only place I've seen this crop up to date is in the case noted above. Both the Undertaker and the class hierarchy in the normalize.py module define a self.db variable that is an instance of a mspasspy.db.database.Database. Internally their constructors need to reference the base class of pymongo or they generate TypeError exceptions in tests like that above. Note if I hadn't been trying to make the constructors robust to check for user input error this wouldn't even happen.

In short, this is a very rare and subtle python issue. I'm pretty sure it was created by the ambiguous Database symbol name alias. I think this ambiguous symbol may also cause circular dependence issues in other contexts I've already seen. I don't have a good way to characterize those other than to warn all users to be careful when using Database as an attribute of any python class.

wangyinz commented 1 year ago

As discussed during our meeting, this is actually not a problem but really a distinction between C and Python. Here in the normalize module, there is actually no real dependency of Database in the Python code. The module does use objects that are presumably Database class, but because of the duck typing feature of Python, it is not necessarily the Database class. As long as the objects has the same methods and attributes defined, the code will work. That is why explicitly checking for typing with isinstance is usually not recommended in Python, and especially in this case it is not a requirement for dbin to be a instance of mspasspy.db.database.Database, instead, we should probably check for the existence of dbin.elog in the code snippet above.