Discussion: tortoise.init() & TortoiseManager

tortoise / tortoise-orm

Familiar asyncio ORM for python, built with relations in mind

https://tortoise.github.io

Apache License 2.0

4.64k stars 386 forks source link

Discussion: tortoise.init() & TortoiseManager #14

Open grigi opened 6 years ago

grigi commented 6 years ago

I'm proposing start with an init handler like so:

def init(...) -> TortoiseManager
    ...

manager = tortoise.init(
    modules=['app', 'app.sub'],
    databases={
        'default': <some db handler>,
        'notes': <some db handler>
    },

)

# Models are discoverable (and introspectable)
manager.models →
[<app.Projects>, <app.sub.Notes>, <app.sub.Tasks>]

# Actions are discoverable
manager.actions →
{
    'tortoise': {
        'syncdb': { describe itself and parameters here },
        'makemigrations': ...,
        'migrate': ....,
        'info': ...
    },
   'app.sub': {
        'import_tasks': ...
    }
}

# Actions are executed
await manager.action('syncdb', param1=value1, ...)

# And then functionality useful for testing:
await manager.snapshot(...)
await manager.rollback(...)
await manager.teardown()

The TortoiseManager object would be a class that contains all the discovered models, and access to some management commands. (a bit like Django, but simpler at first).

Management action functions get the TortoiseManager object as an explicit parameter. The idea is to be able to do all introspection through a standard interface, to try and minimise incentive to access private functions.

We could do a tortoise shell command, which would be a wrapper around tortoise.init and TortoiseManager. We do it like that to allow any system it is embedded in to easily help manage it in their own tools.

For testing I would need the ability to tear-down, the tortoise environment, and handle both application-wide models, and test-local models. We can do something like:

from tortoise.testing import TortoiseTest

class TestThis(TortoiseTest):
    <some config here>
    ...

Where it would automatically build a test db if models are test-local (and tear it down after the tests complete), or use the globally configured test database. For our own testing we will probably use test-local models, and real application would probably use global models. So the use case should be optimised (ito convenience) for the global model use case, and test-local models can be a bit rough.

Automatic configuration will be quite important, so I propose a set of "sane" defaults, such as ability to use environment variables/config file to do init automatically. For that to work we also need to have the ability to initialise databases off a single string. eg. using URI schema like: https://github.com/kennethreitz/dj-database-url

Any comments/suggestions?

abondar commented 6 years ago

Overall I think this is a good direction and this is where we should head. But I think it should incapsulate even more from user, leaving him just with Tortoise.init(), or may be even without it some day. I don't like quite like idea of working with env variables, because it's always not very clear what's going on, and it's a little bit harder in administration. But the idea of config file is something I really was thinking about myself. In my opinion it should look something like this:

_tortoiseconfig.py

APPS = {
    'connections': {
        'default': {
            'engine': 'tortoise.backends.asyncpg',
            'credentials': {
                'user': 'tortoise',
                'db': 'tortoise',
                'password': 'qwerty123',
                'port': 5432,
                'host': 'localhost',
            }
        },
        'users': {
            'engine': 'tortoise.backends.sqltile',
            'credentials': {
                'path': '/tmp/temp_base.db',
            }
        }
    },
    'apps': {
        'events': {
            'models': ['app.models', 'app.subapp.models'],
            'connection': 'default', 

        },
        'users': {
            'models': ['users.models'],
            'connection': 'users',
        }
    }
}

May be we can add some more options to this config if we will think that this is necessary. And then user in code will call:

Tortoise.init(settings='tortoise_config') (we can have tortoise_settings as default to make it even easier)

And it will just do all the magic instancing models with appropriate connections.

grigi commented 6 years ago

I really like that config dict of yours. I think we should make what we send into tortoise.init() to build off a config dict like that. Then you can layer something in-front of it to do some auto-magic, or be used as part of another framework.

The envvar idea came from if we have a tortoise shell management command. it should be able to fetch the config from a file. And the lookup for that file could be in the order of: command-line-param > env-var > default_name But the only thing that uses the env var would be the shell management command.

The reason I want tortoise.init() to return an object, is two fold: 1) Discovery of ACTUAL configuration, so as a plugin writer I don't need to do fragile introspection, it is given. 2) Performing actions that can be registered by apps

If tortoise.init() alters global state, we can always auto-fetch the management object (like how django does it):

from tortoise.management import settings

But for testing we would love to have a way to have many such instances available in a non-global way. e.g. for our own tests we want to be able to have a specific tortoise environment for each test module. Since this is for our own tests, I'm happy to make them a bit fiddly, as we want to optimise for ease of use.

That leads me to: What is your perceived priorities? E.g. we want this to work for 90% of use cases, and do so simply: Correctness > Simplicity > Performance > Features > Maintenance or be more feature rich: Correctness > Features > Simplicity > Performance > Maintenance or performance more important? Correctness > Performance > Simplicity > Features > Maintenance etc?

That will guide a lot of decisions.

grigi commented 6 years ago

Hmm, I feel that we need to get some kind of management infrastructure in place to make testing bearable. I'll start on this. It is going to take a while to get this done. And you need to be very strict/opinionated about what you want to happen there.

abondar commented 6 years ago

What is your perceived priorities?

I think for the first release we should keep it simple because now is time when core design is decided and making it hard now will lead to more complex use cases in future.

But after first real release I think we could concentrate more on fancy features to bring people who need more of SQL features, while keeping core simple for most of users

mijamo commented 6 years ago

I like this new ORM and I have a few things I would like to say about this particular discussion

1) I think the user shouldn't NEED to use the result of tortoise.init. The main reason is that if we need to perform some action somewhere else, we would then need to import the "manager", but that manager was probably created in the main file so we end up with a weird dependency circle which is prone to errors. On the other hand if init mutates a global state, there could be a tortoise.get_manager that returns what is needed to perform management operations and could be imported elsewhere.

2) I think users might have different systems in place for settings. This is not a whole framework so it is not really possible to make assumptions about the projects in which it runs. I think in that configuration the easiest bet is to just take a settings argument in the init function, which would be an object. Users are then free to use what they want to get that object (JSON, YAML, flask config, separate file or just in the init file itself). I still see a benefit of passing a single object instead of separating the input arguments of input because some options might not be stricly be config per se but still useful (ex: test/debug options). It makes it also easier to have other alternatives, for instance using a config_file instead of config, and doesn't have any negative point really (apart maybe from removing autocompletion or arguments?).

3) Just a remark but I noticed above the use of 'connection' inside the apps settings. I would much much rather prefer a Routing option like the one from Django, which allows much more custom behavior and is much more practival (ex: DB deplication). I know it's outside of this issue but still good to have in mind I think.

grigi commented 6 years ago

I think the result of the discussion was generally in agreement with you on points 1 & 2. For Point 3, there is some rudimentary routing available already, just look at examples/two_databases.py

andreymal commented 4 years ago

Why is this issue closed? Personally, I’m not very comfortable that Tortoise has the shared global state. I would like to have an interface similar to that described in this issue.

@mijamo

that manager was probably created in the main file

You don't need to create it in the main file. You can use a separate file (something like myapp/db.py) and implement a get_manager function in this file, or maybe a class with a manager field.

In fact, the reason I bumped this issue up is that I'm trying to make an application as an isolated class without any global state, but I can’t do it because many ORMs (including Tortoise) have a global state :(

grigi commented 4 years ago

Hmm, I think the reason it got closed is because we implemented the part that hurt. And possibly priorities changing and forgetfulness.

I'm reopening this, but we need to come up with a better design on how this should be.

@andreymal could you please propose how you would want to do this? Note that we will have to end up with a "legacy interface" that doesn't break people code (for a few months at least, where it emits a warning with porting instructions).

andreymal commented 4 years ago

@grigi I'm not an experienced Tortoise user (yet), so don't take my opinion too seriously :) I didn't investigate the Tortoise internals deep enough, so I could miss some important problems and compatibility issues. Everything written below is a quick look that may miss important details.

In general, I assume that compatibility can be implemented using some global default manager, and the Tortoise class can just call methods of the default manager, something like this (pseudocode):

class Tortoise:
    @classmethod
    async def init(cls, ..., set_default: bool = True) -> TortoiseManager:
        if set_default and get_default_manager() is not None:
            await get_default_manager().close()
        manager = create_manager(...)
        if set_default:
            set_default_manager(manager)
        return manager

    @classmethod
    def get_connection(cls, connection_name: str) -> BaseDBAsyncClient:
        return get_default_manager().get_connection(connection_name)

    @classmethod
    async def generate_schemas(cls, ...) -> None:
        await get_default_manager().generate_schemas(...)

    # etc.

When set_default is False, the default manager is not set and the global state will not exist.

A context manager will help get rid of the global state in models (inspired by DEP 0009: Async-capable Django):

async with manager.new_connections():
    # something like tortoise.get_current_connection() will work here
    tournament = Tournament(name='New Tournament')
    await tournament.save()  # uses get_current_connection() internally
print('saved')
# something like tortoise.get_current_connection() will NOT work here (if the default manager is not present)

set_default_manager(manager)  # enables backwards compatibility
# something like tortoise.get_current_connection() will work here again

These two things will cover my needs. About actions and tests I have nothing to say at the moment, but lgtm

grigi commented 4 years ago

:thinking: There is some good ideas there.

So something along the line of

manager1 = tortoise.Manager().init(...)
manager2 = tortoise.Manager().init(...)

All managers are registered in a contextvar, so by changing the context (or setting a default context for the global behaviour) This would require Py3.7 minimum to work right.

So, backward compat could be achieved by something like this:

# The default manager
default_manager = Manager()
# The manager context has to be at the same scope as the event loop, so must be defined global
# Separate threads _should_  just work as the event loop is a thread-context.
managers_context = ContextVar('managers_context', default=default_manager)

class Tortoise:
    @classmethod
    def init(...):
        default_manager.init(...)

Then if you want the non-default manager one can do a

async with manager1.context():
    <stuff>

This would require that some parts of a model, such as the default connection to live in the Manager, so that it can fetch it from the context e.g. managers_context.get() and then use that. Also, the transaction context will have to lift from a global contextvar to one in the Manager.

This would have a limitation, all instances of a Model must be the same. As they are still declarative. By that I mean that if in one manager you init module a, and in another you init module a & b, then as long as module a is not referenced by module b, the model classes generated code will be the same, and now we can share our code. (Code in Python is pretty much global)

Something like your proposal to do a set_default_manager() won't work as it would literally set the scope in that function, and release it when it returns... So it won't propagate. scope in asyncio is more like a tree, versus a thread-local scope variable is more like a list: Any async scope can spawn many async scopes concurrently, whereas threads can only be spawned by the process.

At a quick look, this would require changes in db clients, db executors, Models, Transaction manager, and the Queryset. Once all this is adequately isolated, it should work as described.

Well, that's about half the codebase :man_shrugging: