hmarr / mongoengine

[Moved to mongoengine/mongoengine]
https://github.com/MongoEngine/mongoengine
MIT License
794 stars 20 forks source link

allow_inheritance default off #437

Open rozza opened 12 years ago

rozza commented 12 years ago

creates extra indices for all schemas - which will bite people with large datasets.

hmarr commented 12 years ago

Controversial! But I approve :)

mitar commented 12 years ago

Could this be some global variable/setting?

It is really a wrong approach in my opinion. Because it is not the person who defines the class the one who decides how big datasets the user will use. Or if she will user inheritance or not. Maybe I download the library which defined a document with allow_inheritance to on. But I will not use inheritance, but will have huge number of documents. A problem. Or other way. allow_inheritance is to off. But I will not have much documents, but want a nice document design, full of inheritance.

rozza commented 12 years ago

A global setting makes less sense, as most document definitions will never be inherited from, so a finer grained control is required. The problem with the default as on is real people are being hurt by it in production because it is essentially behind the scenes.

Turning it off by default means you will raise an exception if you try to inherit and by far the most common place for this to ever happen is pre release.

mitar commented 12 years ago

Yes. But how to turn it on, when you need inheritance?

What about doing some behind magic in document metaclass. If any document inherits from some other document, then it switches the flag on on this other document (it adds those necessary fields for inheritance to the document). You could have then have in fact semantically different meaning for the flag:

In this way library writers will probably use True. For your document definitions, when you want to be enforce some policy, you could use None, to be informed if something like this happens, or False, if you know that this will kill the database and any developer of your team should not do something like this.

mitar commented 12 years ago

For example, see my #476. Now that I think of. What should be the right value for mongoengine? Because in my scenario, I inherit from it and need to be allowed for that. But I could imagine that some user of mongoengine will have billions of users and those additional fields could bite her.

Somehow unrelated: it could help to have abstract classes? For example, in my scenario above, I do not really need mongoengine's user document, I just want to extend it with some additional fields and use it as my own, using only my version, probably never really inherit from that my further.

rozza commented 12 years ago

@mitar - good points, I can see why most other mongodb orm / odm libraries dont do inheritance at the db level and just store the cls name.

We could allow mixins - based off simple objects, but this becomes quite verbose:

UserExtras(object):
    image = ImageField()

User(OriginalUser, UserExtras):
    pass

But thats an ugly interface

User(OriginalUser):
     image = ImageField()

    meta = {'extended': True}  # this could be inferred?

That is nicer from a developers point of view.

Types is only used for converting to type so these would OriginalUser.objects() over User.objects() and perhaps we only add support for that explicitly - eg something like adding special TypesField in the model definition.

Principally, we could clean up alot of code that deals with _types and abstract it out, but that would be part of a larger refactor.

mitar commented 12 years ago

I think programming interface could be very similar to how it is now. We could just use metaclass for documents which would check if they are inheriting directly from the Document or if there is something in-between. If it is, it goes over the parents and flip the extended bit over in their class definition, checking for value of allow_inheritance, as I described above. When extended is True, that document adds _types and other stuff.

The only downside of this, that I can see, is what if you later on add some document which is extending older document. Then those type fields are missing in all stored instances until then. For Django, this could be solved with some manage.py script. (Similar to how you have to use syncdb for relational databases.)

I am still curious. What exactly are problems on big databases with this inheritance? So what is the initial problem we are trying to solve?

Because this inheritance was one of my main goodies why I have chosen MongoEngine over other solutions. I find much cleaner design this way.

mitar commented 12 years ago

I am not sure if we can get rid of _types if we want polymorphic nature of querysets. Which is really a very good thing to have.

rozza commented 12 years ago

People dont realise _types is added to every document when they need it under 10% of the time, so default as On is the wrong way round. It makes all indexes multikey indexes, impacts performance, cant be used with covered indexes and generally isn't needed.

By the time it impacts performance / dropping indexes and removing the fields is expensive and requires downtime.

Adding _types is fine but it should be a planned addition not implicit and that doesn't mean you wont have it if you want / need it.

Simply inheriting isn't a good model for adding _types a class as people want to extend the document but not add _types.

ghost commented 12 years ago

In my humble opinion, Inheritance in Mongoengine is quiet awkward; Mongoengine does something (such as _types) not supported by Mongodb in nature, which of couse, does not look good especially when I query database in Mongo shell.

I really welcome rozza's decision. I believe Mongoengine will be more succinct and efficient. and beginners to Mongoengine will not be surprised by their first trying Mongoengine when they look at schemas made by Mongoengine after running some inserts without being aware of the existance of INHERITANCE.

mitar commented 12 years ago

For me personally, inheritance was the main reason why I chose MongoEngine. I think by trying to get MongoEngine to be more like other mappers, it loses its specialty. It is hard to try to satisfy everybody. But it is really required? Or it is better to try to stick to key features which distinct you from other mappers?

ghost commented 12 years ago

mitar, Mongoengine's inheritance is awesome thing that other mongo mappers don't have. rozza seems not to take into consideration getting rid of the whole functionality of inheritance and I do; why do we have to remove that awesome thing?. But in reality, most people do not use inheritance. I do not know how Mongoengine implements inheritance though, it's obvious that using inheritance takes up some computer resources; why do we have to write db_field for extreme optimization? if we use inheritance, it definitely obviates the efforts we have gone through for optimization.

and personally, I don't like that the schema made up by using inheritance looks dirtier than my room. Mongodb..(not Mongoengine) by its nature already uses too much resources.

rozza commented 12 years ago

Well guys not to worry, I will ping you when there is some sample code. The main point is - you can still have inheritance should you need it, its not going away.

@ChangMin - the schema should only have two extra fields _cls and _types for mongoengine so its not that dirty ;)

Having a nicer solution to _types would be good as the multikey index can cause costs, a convention of only using the collection for inheritance would store less and be more efficient for querying.