proofit404 / dependencies

Constructor injection designed with OOP in mind.
https://proofit404.github.io/dependencies/
BSD 2-Clause "Simplified" License
361 stars 18 forks source link

Deny to use async def and generators with @value decorator. #114

Closed proofit404 closed 5 years ago

proofit404 commented 5 years ago

Suggest @operation decorator instead with await within the story method.

We won't make await during DI process. It'll be too hard to implement and probably will cause hard to find bugs.

thedrow commented 5 years ago

This limitation may be a problem for users. There are ways to avoid it.

Imagine that we want to inject a new connection from the connection pool to an object that uses it.

class Foo:
  def __init__(connection):
    self.connection = connection

  async def bar(self):
    await self.connection.do_io()

class Container(Injector):
  connection_pool = ConnectionPool
  foo = Foo

  @value
  async def connection(connection_pool):
    return await connection_pool.connect()

Since Foo needs the connection we have to initialize it using async def. We cannot connect during __init__ since it cannot be a coroutine.

The real problem is that cleanup won't occur and the connection won't be returned to the pool. We can make Foo an async context manager that will consume a connection from the pool and release it after Foo goes out of scope. This is a correct design but it doesn't require to use @operation any more. We could have a special decorator @context_value (or a more suitable name) which requires you to do the following:

class Foo:
  def __init__(self, connection):
    self.connection = connection

  async def bar(self):
    await self.connection.do_io()

class Container(Injector):
  connection_pool = ConnectionPool
  foo = Foo

  @context_value
  async def connection(connection_pool):
    connection = await connection_pool.connect()
    yield connection
    connection.close()

with Container as c:
 c.foo.bar()
 # other operations

If @context_value is present and is going to be used to resolve a dependency we must use a with statement on the container itself so that the connection would be released.

The other legitimate use-case for using async @value decorators is to perform some I/O operation before initializing an object. A common initialization pattern for an object is to load configuration from a file or even a database. These kind of initialization steps are self contained and can be easily used without further extension of the library.

class Foo:
  def __init__(self, configuration):
    self.configuration = configuration

  def bar(self):
    if self.configuraiton['baz']:
      do_something()
   else:
     do_something_else()

class Container(Injector):
  configuration_reader = SomeConfigReader
  foo = Foo

  @value
  async def configuration(configuration_reader):
    return await configuration_reader.read()

Forcing the user to use an @operation means that the object will have another step of initialization that does not use this library. There can be more than one such dependency, perhaps many. If we chose to implement the suggestion in this issue I feel like we're missing the point.

In this example, the Foo object will need to have a configure method or we will need to use an @operation to get a fully functional Foo and thus accessing Container.foo directly should be forbidden. Both solutions would require the user to invert dependencies on his own.

A solution to that problem could be to guide the user to create an internal container inside the @operation to inject the resolved dependencies. In this case, the configuration that was read from a database or a file.

I hope I helped.

proofit404 commented 5 years ago

Hi Omar,

Thanks for such a verbose response!

Coroutines are indeed a hard task to design properly.

For me, I see two variants to implement it:

  1. Return non-awaited coroutine:
@dataclass 
class Foo:
    bar: Any

    async def baz(self, argument):
        await (await self.bar).do_io(argument)
  1. Await coroutine during the dependency injection process:
class Container(Injector):
    foo = Foo

    @value
    async def bar():
        result = await quiz()
        return result

await (await Container.foo).baz(1) 

As you can see both cases looks awfully bad. I prefer to use @operation decorator this way:

@dataclass 
class Foo:
    bar: Any

    async def baz(self, argument):
        bar = await self.bar()
        await bar.do_io(argument)

I agree that with and async with usage together with Injector looks like a really great idea! Finalizers are indeed missed from our libraries. I need some time to think about it.

But I don't understand yet how it could solve the problem I described above.

What do you think?

Regards, Artem.

thedrow commented 5 years ago

My name is Omer. Not Omar. It's the same name, just in Hebrew instead of Arabic. It's a common mistake so don't sweat about it :smile:.

Your example is not clear enough. Take the examples I gave and modify them as needed. But in any case, there is no need for the double await.

The container itself should await the for the value. The result is then injected to the __init__ method of Foo and is set to the configuration attribute in example number two.

It's a matter of checking which value is a coroutine and invoking it using await.

proofit404 commented 5 years ago

Shame on me! Sorry about the name...

One await is necessary to execute value coroutine and another await is necessary for your method of the Foo class.

thedrow commented 5 years ago

But that's the entire point. The Injector should await the value before injecting it into Foo, thus avoiding to require the user to await for it itself.

We should await here I think if the attribute is a coroutine if I'm not mistaken.

proofit404 commented 5 years ago

Yes, if we decide to await during dependency injection process we will have double await in the usage call.

  await (await Container.foo).baz(1)
#       --------------------- this one is necessary to awat value required by foo
# ----- this one is necessary to await bar bound method  

This API is really confusing for me.

Do you know how to workaround this two sequential await calls?

thedrow commented 5 years ago

I see what you are saying. I think it's best to await the entire container.

class Foo:
  def __init__(self, configuration):
    self.configuration = configuration

  def bar(self):
    if self.configuraiton['baz']:
      do_something()
   else:
     do_something_else()

class Container(Injector):
  configuration_reader = SomeConfigReader
  foo = Foo

  @value
  async def configuration(configuration_reader):
    return await configuration_reader.read()

await Container

Container.foo.bar()

That will resolve all async dependencies at once. Alternatively, we could use the event loop to schedule the execution of configuration.

When using asyncio, you can create a task and schedule it for execution without using an async method. See asyncio.Task for details. The container must have a loop attribute or can default to asyncio.get_event_loop(). On trio, it's currently not possible as scheduling tasks require an async with statement to open a nursery. I suggest we start by supporting asyncio and I'll work with @njsmith to provide such an API. In the future we can use AnyIO for scheduling these tasks. I'll work with @agronholm to make that happen.

We could provide both variants for now.

proofit404 commented 5 years ago

Awaiting the whole container is an interesting idea. It's even better it could be improved to async context manager in the future without breaking API.

async with Container as ns:
    return await ns.foo()

We also could self protect users against not awaited coroutines in the value objects with our check framework relatively easy.

I want to keep containers immutable possibly. Maybe we could return new Injector subclass with dependencies overwritten.

If the resolution process takes too long, we could introduce @persistent decorator meaning it will be resolved only once. The same way we implement Package proxy.

Regards, Artem.

thedrow commented 5 years ago

Actually when I think about it, when using trio you can do the following:

async with trio.open_nursery() as nursery:
  class Container(Injector):
    nursery = nursery
    configuration_reader = SomeConfigReader
    foo = Foo

    @value
    async def configuration(configuration_reader):
      return await configuration_reader.read()

Since nursery.start_soon() does not require an await statement to schedule a task.

The only problem is that trio does not provide a comfortable API for returning results synchronously while waiting. We could however use open_memory_channel and use trio.MemoryReceiveChannel.receive_nowait which is synchronous. It raises a WouldBlock error if there's no value yet. So we'd do something like:

send_channel, receive_channel = trio.open_memory_channel()
nursery.start_soon(trio_async_resolver, some_value_to_resolve,  send_channel, *args)
while True:
  try:
    resolved_value = receive_channel.receive_nowait()
  except WouldBlock:
    pass
  else:
    break

There's a possible defect in the implementation here since we cannot trio.sleep (since it requires an async statement as well, obviously) to allow switching to a different coroutine which may or may not hog the event loop. I hope there's a way to resolve this.

thedrow commented 5 years ago

Yes, that's one way to do it. The other way is to get rid of the requirement for async entirely which is doable in asyncio and almost possible in trio.

Tasks use Python's concurrent.futures.Future as a base class. They wait for a threading.Condition to be met as written here.

It's possible and more convenient to get rid of the async statement. We just have to figure out the right way to do it.

Would you mind video chatting about this?

proofit404 commented 5 years ago

Yes, I think it'll be convenient to make a dedicated call for this topic.

thedrow commented 5 years ago

I just checked and it's doable in trio using a concurrent.futures.Future.

import trio
from concurrent.futures import Future

async def foo(f):
    await trio.sleep(2)
    f.set_result("hello world")

async def main():
    f = Future()

    async with trio.open_nursery() as n:
        n.start_soon(foo, f)

    print(f.result())

trio.run(main)

Prints "hello world" to the screen and finishes without requiring us to use async at all.

Where do you want to talk? Hangouts?

thedrow commented 5 years ago

I just sent you a message on hangouts on your public email.

njsmith commented 5 years ago

I think the root reason that this is tricky, is because of the choice to build the API around doing complex work inside attribute access. It's a bit unusual in Python... it's more common to use functions to represent complex work. So my first thought would have been to design the API like:

# Option 1
class Container(Injector):
    robot = Robot
    servo = Servo
    amplifier = Amplifier

# Instantiate everything:
container = Container()
# Now use the instantiated object:
container.robot.work(...)

Or even

# Option 2
container = build(
    robot=Robot,
    servo=Servo,
    amplifier=Amplifier,
)
container.robot.work(...)

And in this style of API, then it's pretty obvious how to add async support – we'd make the build operation async:

# Async version of option 1
# (__init__ can't be async, so a common pattern in async code is to have async classmethod constructors)
container = await Container.create()
await container.robot.work(...)

# Async version of option 2
container = await build(...)
await container.robot.work(...)

In this case, you probably want to support both sync and async, and you probably don't want to totally redo the sync API and break all your users. So one option would be to keep the sync API the same as it is now, but also add an async API that has an explicit instantiation step:

class Container(Injector):
    robot = Robot
    servo = Servo
    amplifier = Amplifier

# Sync version -- raises an exception if any of the objects need async initialization:
Container.robot.work(...)

# Async version -- always works, but requires an async caller
container = await Container.create_async()
container.robot.work(...)

Or if you prefer to keep the attributes on the class object, I guess you could have

# Immediately initializes all attributes
await Container.setup_async()
Container.robot.work(...)

The idea is: if setting up the object graph require async operations, you can't do it implicitly on attribute access. But you could detect this on attribute access, and make attribute access raise an error until the user explicitly calls await Container.setup_async(), and then from then on you can use regular attribute access.

proofit404 commented 5 years ago

Thanks @njsmith for such verbose properly written response!

thedrow commented 5 years ago

@njsmith That's one way to do it. The other way is without making us use async on the container. It's possible in asyncio using Tasks and possible in trio using a concurrent.futures.Future to return the value when the coroutine is done.

What do you think about that?

njsmith commented 5 years ago

@thedrow I don't think it is possible. In asyncio you can create a task to fill in the container, but that just lets you arrange for the container to be filled in at some unknown point in the future. Writing container.robot can start some tasks, but it can't actually return the Robot object.

And in any async system, including Trio, if you call result() on a concurrent.futures.Future that's waiting on a background task to complete, then you'll deadlock your whole program. In your example above, it seems to work because you close the nursery before calling result(), so you're waiting for the child task to complete – basically you're just writing await child_task() but in a more verbose way.

thedrow commented 5 years ago

It's more verbose but it saves the user from being aware that some of the dependencies are async. Why should a user care in this case?

I haven't seen a case in asyncio were calling task.result() blocks the execution of the event loop. Why is that the case with trio?

njsmith commented 5 years ago

It's more verbose but it saves the user from being aware that some of the dependencies are async. Why should a user care in this case?

When the Python core devs were designing async/await, they made the decision that you shouldn't be able to hide when background tasks are running. There are various arguments for and against, but it kinda doesn't matter in this particular context, the fact is that you can't do it :-)

I haven't seen a case in asyncio were calling task.result() blocks the execution of the event loop. Why is that the case with trio?

In asyncio, task.result() doesn't wait for the task to finish – if it's already finished it returns the result, and otherwise it raises InvalidStateError. If you want to wait for the task to finish, you have to use await.

In general, in any async library in Python, every operation that doesn't have an await in front of it blocks the event loop until it's finished. For simple operations like, I dunno, looking up a key in a dict or something, then the operation only takes a tiny fraction of a second, and blocking the event loop for a tiny fraction of a second is fine. But if you want to wait for a background task to finish, that has to use an await.

thedrow commented 5 years ago

That's not true. If you use a asyncio.Future it doesn't wait. If you use an asyncio.Task it does. See here.

I don't know which design is better though.

thedrow commented 5 years ago

@proofit404 I think that at this point we can close this issue and replace it with another one.

njsmith commented 5 years ago

That's not true. If you use a asyncio.Future it doesn't wait. If you use an asyncio.Task it does. See here.

I don't understand what you mean here at all. The code you linked to isn't even in asyncio, so what does it have to do with the difference between asyncio.Future and asyncio.Task?

thedrow commented 5 years ago

Because asyncio.Task inherits from concurrent.futures.Future. asyncio.Future doesn't.

proofit404 commented 5 years ago

I want to apologize for @thedrow @njsmith and @supadrupa I lost the call. It was completely un proficient from me.

I thought we discussed everything here.

Will await Injector and async with Injector work for everyone?

Regards, Artem.

njsmith commented 5 years ago

Because asyncio.Task inherits from concurrent.futures.Future. asyncio.Future doesn't.

I think you mixed it up:

In [8]: issubclass(asyncio.Task, concurrent.futures.Future)                     
Out[8]: False

In [9]: issubclass(asyncio.Task, asyncio.Future)                                
Out[9]: True
njsmith commented 5 years ago

Will await Injector and async with Injector work for everyone?

I guess await Injector is the same as my setup_async idea, except hiding the function call inside an __await__ method? Stylistically I prefer to use functions for stuff like this (and in Trio we always use await with function calls to make it simpler to teach), but I think it's just a question of what style you prefer.

thedrow commented 5 years ago

Well that's strange since https://github.com/python/cpython/blob/master/Lib/asyncio/tasks.py#L98 says otherwise.

agronholm commented 5 years ago

That class inherits from .futures.PyFuture which is just an alias for asyncio.Future.

proofit404 commented 5 years ago

Superseded by #132 and #133.