ponyorm / pony

Pony Object Relational Mapper
Apache License 2.0
3.64k stars 245 forks source link

Reusing the same data model for different databases #330

Open ml31415 opened 6 years ago

ml31415 commented 6 years ago

Right now, code like this seems to be common place as well in the test cases of pony, as in most pony projects:

db = Database('sqlite', ':memory:')

class Person(db.Entity):
        name = Required(unicode)

db.generate_mapping(create_tables=True)

What happens is, that the declarative base of the database, the Entity class, is defined together with the definition of the database connection. It is possible to delay the definition of a provider, calling db.bind() later on. Though, right now it's not really possible to use the same data model for different database instances.

This makes it hard to reuse any data model definition, and e.g. makes it necessary in the test cases, to define a new data model for nearly every test file. It's not possible, to define a data model, and then use it for connections to different databases or newly created instances. Especially in the test cases, where you want to guarantee a fresh setup for every non-sequential test, this is quite a pain in the ass.

Is there a better way to reuse a data model in pony, than uglyfying the code with lots of duplications? I suppose it might be possible to mess around with the database.provider, but then again, this seems to be really error prone.

SqlAlchemy separates the database setup into a declarative base, an engine (defining the SQL dialect) and a session. I'm aware that this would cause major refactoring, and possibly also might make current syntax like Entity.select() impossible, as the data model object wouldn't be able to access a session, and it would be necessary, to specify which session is meant. Though, in the end it should be a way cleaner logic and library code than right now.

dm = DataModel()

class Person(dm.Entity):
        name = Required(unicode)

 # Person.select() should thrown an error for not having a session defined
session = Session(dm, 'sqlite', ':memory') # creating a provider and using dm.generate_mapping()
dm.default_session(session) # Might make preserving current syntax possible
Person.select(lambda p: p.name.startswith('a')) # would work now

session2 = Session(dm, 'sqlite', ':memory') # No problem anymore
session3 = Session(dm, 'sqlite', ':memory') # Another separate db using the same data model

session3.select(p for p in Person if p.name != "asdf")
# Person should have no idea in which session it might be used
# but session could verify that Person is part of dm

Is this relevant? Yes, I guess so. The current situation makes writing compact and reliable test cases rather annoying. Doing backups and mirroring you also might regularly run into two separate databases having the same data model. It's just some very illogical combination, to fuse the data model and data connection logic into a single class.

If I have missed something, that would help separating the model and the connection already, pls let me know!

luckydonald commented 5 years ago

Another use case:

I'm trying to use Pony as storage backend in some project. The user can choose between a basic in memory dict, mongodb and hopefully sql databases with ponyorm:

I basically need to store (and retrieve by chat_id, user_id) the following data:

state = {
   'chat_id': 1234,
   'user_id': 1234,
   'state': 'SOMETHING_CAPSLOCK',
   'data': {'any': 'valid', 'json': True},
}

The problem is, I can't create a database table as I don't have access to the db.Entity object I would need to subclass, as the user has to create that at a later time.

luckydonald commented 5 years ago

To include my usecase sent back in Nov. 2018 in the telegram group:

Hey @metaprogrammer, did you have the time to look into registering an Entity to a Database at a later time?

I just got another use case, that I have a Users table I want to use in different projects.

from somewhere import User

class Post(later.Entity):
     poster = Required(User)
     text = Required(str)
# end class

# at a later time
db = Database('postgres://...')
User.register_db(db)
db.register(Post)

In that way I could init just that table if I don't already have it,

  • without creating all the unneeded tables from the other project
  • without having to duplicate the code and keeping it up to date through several (7+) projects

I think it would require to move the registering logic from MetaEntity to a central Database.register() function, and calling that from MetaEntity.

So basically like flask's init_app() and Blueprints

luckydonald commented 5 years ago

Also https://t.me/ponyorm/7360 and the following messages.

amalashkevich commented 5 years ago

Fixed typo in the url. Correct one is https://t.me/ponyorm/7360

On 19 Feb 2019, at 00:24, Luckydonald notifications@github.com wrote:

Also https://ponyorm/7360 and the following messages.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

kozlovsky commented 5 years ago

Hi Michael and Luckydonald!

Sorry for long delay with answering. Currently I'm overloaded on my official job, and the very limited time that I can work on Pony is dedicated to migrations. Some time later I'll return to all other issues and fix them as well. Right now we don't have enough time resources for it.

Regarding the question how to use the same data model with different databases, we already have a pattern for that. It is possible to define entities inside a function which accepts database object as an argument, and then call this function with different databases.

I think it already works good enough, and can be used for writing tests. The same approach can be used for developing reusable plugins as well. This is an example:

# plugin1.py
from pony.orm import *

def define_entities(db):
    class Foo(db.Entity):
        x = Required(int)
        y = Optional(str)

    return Foo

def select_some_foos(db, a, b):
    query = select(obj for obj in db.Foo if obj.x > a and obj.x < b)
    return query[:]
# plugin2.py
from pony.orm import *

def define_entities(db):
    class Bar(db.Entity):
        p = Required(int)
        q = Optional(str)
        baz = Optional("Baz")

    class Baz(db.Entity):
        m = Required(str)
        n = Optional(int)
        bars = Set("Bar")

    return Bar, Baz

def add_new_bar(db, p, q):
    bar = db.Bar(p=p, q=q)
    return bar

def delete_baz(db, id):
    db.Baz[id].delete()
#main.py
from pony.orm import *
import plugin1
import plugin2

db = Database()

Foo = plugin1.define_entities(db)
Bar, Baz = plugin2.define_entities(db)

db_params = {'provider': 'sqlite', 'filename': ':memory:'}
db.bind(**db_params)
db.generate_mapping(create_tables=True)

with db_session:
    # it is not necessary to use global Foo class,
    # you can access it as db.Foo property instead,
    # but then IDE will not provide an auto-completion for foo.x
    select(foo for foo in db.Foo if foo.x > 10).show()

    # by using global Bar you will get
    # autocompletion of Bar.p from IDE
    # (at least in PyCharm and IDEA)
    select(bar for bar in Bar if bar.p < 20).show()

db2 = Database()
Foo2 = plugin1.define_entities(db2)
# only part of entities defined in db2, plugin2 was not applied
db2.bind('sqlite', 'db2.sqlite', create_db=True)
db2.generate_mapping(create_tables=True)

with db_session:
    # it is not necessary to use Foo2...
    select(foo for foo in db2.Foo if foo.x < 30).show()
    # ...but using Foo2 enables autocomplete in PyCharm
    select(foo for foo in Foo2 if foo.x < 30).show()

I don't want to do any major refactoring of Pony until migrations are ready. It is possible that migration internal engine will be useful for implementing Flask-like plugins too. Right now, current solution with factory functions looks good enough to me. It already enables binding same entities to multiple db, implementing plugins which can add entities to Database, and by returning entity classes from factory functions you can have autocomplete support in IDE too

ml31415 commented 5 years ago

Thanks for your answer. I really do understand your time concerns. One last question here, as I opted for sqlalchemy now anyways:

The main and unique selling point of pony over other ORMs is the neat syntax for iterations, p for p in Persons and so on. Why not just create this iteration syntax as an addon to sqlalchemy ORM layer, instead of trying to create just another ORM? Just as stated in my original issue, I guess sqlalchemy got quite some things right in the very foundation of the whole ORM. And it might be just not worth it, to turn pony upside down now, especially given the limited developer time for the project, breaking API and so on.

luckydonald commented 5 years ago

@kozlovsky How do I use it in a plugin's own file?

# plugin1/foo.py
from pony import orm
from plugin1 import Foo  # <- how to do that?

@orm.db_session
def store_something(y):
  foo = Foo.get(x=2)
  if not foo:
    foo = Foo(x=2, y=0)
  # end if
  foo.y += 2
# end def
# main.py
from plugin1.foo import store_something

store_something(12)
kozlovsky commented 5 years ago

@luckydonald I updated my previous example and added some example functions to plugins for working with objects defined in this plugin

So you don't import Foo in plugin itself, instead you receive db as a function parameter and access db.Foo. Then a user of your plugin can import plugin, call define_entities(db) to add your entities and call your other functions passing db as a first argument.

fluffy-critter commented 4 years ago

Hi, I'm trying to wrap my head around how to do something similar to what @ml31415 is asking, not for the purpose of multiple models in a single app, but so that I can declare my model in a module without the actual bound database being a module global. In particular I want the database object to be "owned" by my Flask app object, which helps with some debugging and deployment scenarios, and also makes it possible for things like having multiple instances of a blueprint which share a database schema but not an actual database.

Wrapping the model schema declarations up in a closure makes sense, but then it gets really unwieldy to deal with things like mypy type hinting. I tried doing a mixin pattern like:

# model.py
from pony import orm

class Foo:
    some_column = orm.Required(str,unique=True)

def generate(db):
    class Model:
        class Foo(db.Entity, Foo):
            pass
    return Model()

but then the query generator couldn't actually find the attributes stored on the realized model's classes; for example:

from pony import orm
import model
db = orm.Database()
my_model = model.generate(db)
my_model.Foo.get(some_column="hello")

would generate an error TypeError: unknown attribute 'some_column'

Of course, I can just define the model once-and-only-once in the generate() function, but then I can't figure out how to use type annotations in a way that preserves my ability to also get typing for the columns themselves.

I feel like I could possibly do something incredibly hacky with __dict__ or something (to copy the properties rather than inheriting them), but is there already an established pattern for doing this that I'm overlooking?