oracle / python-oracledb

Python driver for Oracle Database conforming to the Python DB API 2.0 specification. This is the renamed, new major release of cx_Oracle
https://oracle.github.io/python-oracledb
Other
339 stars 67 forks source link

Support for asyncio #6

Closed anthony-tuininga closed 11 months ago

anthony-tuininga commented 2 years ago

This is a continuation of the original request made on cx_Oracle: https://github.com/oracle/python-cx_Oracle/issues/178.

The current status is that with the addition of the thin driver, adding suport for asyncio will be considerably simpler (and will only work in thin mode). If anyone has suggestions or recommendations on API, please share them!

kleysonr commented 2 years ago

+1

danizen commented 2 years ago

It isn't async per se but the key benefits of escaping from the GIL and better managing RDBMS connections that we need. As both @sharkguto and @P403n1x87 mentioned in oracle/python-cx_Oracle#178, it can be done by wrapping queries in coroutines.

That means that @cjbj and @anthony-tuininga don't need to write a new driver (note that async drivers don't follow PEP 249). I think a thin package that wraps oracledb in an opinionated way (e.g. using connection pools all the time, and get a connection just to do the request) could be used.

I guess there is a problem there for cursors and fetching additional results, but I'll leave it to @anthony-tuininga to figure this out.

anthony-tuininga commented 2 years ago

@danizen, I did make an attempt at simply wrapping cx_Oracle in asyncio coroutines -- but the performance of the result was poor (and that's being generous). As such I don't see any benefit to wrapping the thick driver at all -- other than the fact that you can use asyncio, I guess! My first attempt to generate an asyncio (thin) driver showed excellent performance in comparison. So that is the route that is being considered at this point! I'd like to see this implemented soon -- this year if at all possible! Unless I hear otherwise I'll probably follow the pattern used by asyncpg.

danizen commented 2 years ago

Thanks, @anthony-tuininga, I am looking forward to trying it out.

jiaulislam commented 2 years ago

I guess it's still not supported officially to get async oracle. 💔

anthony-tuininga commented 2 years ago

Not yet, no. Its definitely on the list, though! And I'd still like to see it done this year, yet, if at all possible.

anthony-tuininga commented 1 year ago

Just to give a bit of an update: I have started looking into this and ran into a bit of a roadblock interfacing asyncio with the Oracle database protocol -- but a solution has been found, thankfully! Thanks to that roadblock, getting it done this year is not going to happen any longer...but I am actively working on it (among other things), so hopefully I'll have something for you to look at in January.

One question for those of you following along: as mentioned earlier a simple wrap of the synchronous routines in a future (that executes in a thread pool) works but is about twice as slow as the synchronous version. Would it be helpful to include that as a fallback if the solution I mentioned earlier doesn't work for all database versions? Or would it be preferable to simply state that support isn't available in that case? Comments welcome!

srtucker commented 1 year ago

Thanks for the update. For your second option would it be a hard stop or would the async API still function, it just would affectively be synchronous?

anthony-tuininga commented 1 year ago

The options, I think, are

P403n1x87 commented 1 year ago
* fallback to the "works but is considerably slower" approach (putting synchronous calls into a thread pool for execution)

Slow queries could perhaps benefit from the process, rather than the thread, pool. But opting in, in this case, should require an active user request IMO. Sending a fast query to a process pool might actually be slower than using the thread pool, although I don't have numbers to back this up at the moment.

old-syniex commented 1 year ago

The options, I think, are

  • raise an exception and state that asyncio doesn't work (well) with this database version
  • fallback to the "works but is considerably slower" approach (putting synchronous calls into a thread pool for execution)

I would suggest to wrap the problematic versions in threadpool for consistency.

The community might will be able to suggest solutions

danizen commented 1 year ago

My personal preference would be to raise an exception, but it is not either or. You could raise a warning and then fallback to wrap in a threadpool. Users who want the threadpool behavior could ignore the warning, and users who want the wrapped behavior could catch the warning and raise some sort of system error.

ptekelly commented 1 year ago

any news on this?

jiaulislam commented 1 year ago

They are working on it. But I guess it will work only for thin driver not the thick mode. I think asyncio is also required for thick mode.

anthony-tuininga commented 1 year ago

Yes, we are working on it but also getting distracted by other projects! I can (and have as a proof of concept) implemented asyncio with thick mode -- but that was about 2-3 times slower than without asyncio, which sort of defeats the purpose, I think! Do you still want asyncio even if it is slower than regular synchronous mode? My experimentations with thin are much more promising.

ptekelly commented 1 year ago

for me thick doesn't matter - just using thin (at the moment at least)

jiaulislam commented 1 year ago

Yes, we are working on it but also getting distracted by other projects! I can (and have as a proof of concept) implemented asyncio with thick mode -- but that was about 2-3 times slower than without asyncio, which sort of defeats the purpose, I think! Do you still want asyncio even if it is slower than regular synchronous mode? My experimentations with thin are much more promising.

If I have a database on a remote can I use Thin mode ? I tried earlier but it was giving an exception saying require thick mode. I didn't dig up that issue much as I was in hurry with that project but I guess I have to look it up again if thin mode works in remote database.

cjbj commented 1 year ago

@jiaulislam yes you can connect in Thin mode to remote databases. This is off topic for this thread, so if you have questions about it, please start a new discussion.

vbadita commented 1 year ago

What would be the database versions which don't support asyncio ?

cjbj commented 1 year ago

No further discussions have occurred. @anthony-tuininga has been busy on the Thin node-oracledb driver. What DB versions are you using?

ptekelly commented 1 year ago

No further discussions have occurred. @anthony-tuininga has been busy on the Thin node-oracledb driver. What DB versions are you using?

I'd be happy for thin node async - is there forum post about that - or better still a time frame?

cjbj commented 1 year ago

No time frame. This issue is the place to follow to get news.

ptekelly commented 1 year ago

ok thanks

danizen commented 1 year ago

Now that I am using asyncio more heavily, I think the main benefit from this would only be in managing the number of connections, and that can be addressed somewhat with DRCP configuration. One trick I am doing is to create a ThreadPoolExecutor and then using the asyncio.run_in_executor formulation to run synchronous code.

This means I can tune the ThreadPoolExecutor to have the same number of threads i expect in a connection pool, and it is quite functional.

If you couple that with DRCP, it gets yet more functional.

vbadita commented 1 year ago

Now that I am using asyncio more heavily, I think the main benefit from this would only be in managing the number of connections, and that can be addressed somewhat with DRCP configuration. One trick I am doing is to create a ThreadPoolExecutor and then using the asyncio.run_in_executor formulation to run synchronous code.

This means I can tune the ThreadPoolExecutor to have the same number of threads i expect in a connection pool, and it is quite functional.

If you couple that with DRCP, it gets yet more functional.

Thank you. I tried ThreadPoolExecutor with connection pooling and it seems it's working fine.

ptekelly commented 1 year ago

Hi - any update to this?

cjbj commented 1 year ago

@ptekelly Not yet!

Jaza commented 1 year ago

I'd also love to see asyncio get supported here. I've built a few FastAPI web apps recently, whose main job is to connect to an Oracle database, and currently they're using https://github.com/oracle/python-cx_Oracle plus https://github.com/GoodManWEN/cx_Oracle_async . I'm keen to switch them to python-oracledb (thin mode would probably be fine for them), but I'm reluctant to do so if it still lacks asyncio support, especially if it's actively being worked on and will be available soon.

cjbj commented 1 year ago

@Jaza It won't be soon-soon but, now that Oracle Database 23c Free is out, we (I mean @anthony-tuininga) will get some time to work on it. There are some interesting details to work through, see https://github.com/oracle/python-oracledb/issues/6#issuecomment-1341586315. I expect @anthony-tuininga can do magic but we know that 23c's network protocol changes will definitely be a help for making the implementation cleaner.

ptekelly commented 1 year ago

Hi @cjbj - Can I check I didn't miss anything here? I cannot find the 23c on OCI free tier. Do i need to install it manually on a linux host to get this?

cjbj commented 1 year ago

@ptekelly Oracle DB Free is an on-premise release so you will need to run the installer etc. https://cjones-oracle.medium.com/installing-the-oracle-database-23c-free-rpm-package-8cdc829ff94b

ptekelly commented 1 year ago

ah ok thank you

Jaza commented 1 year ago

@cjbj @anthony-tuininga I'm pretty sure that's all cx_Oracle_async (which I'm currently using) does, it just wraps cx_Oracle calls in a future that executes in a thread pool (and its readme notes that that probably isn't the most efficient approach!). And, incidentally, if I were to switch to python-oracledb now, and call it directly in my FastAPI apps, that's also all FastAPI does, it just wraps synchronous calls in a future that executes in a thread pool.

So, until you guys cook up some magic and release a cleaner asyncio implementation for python-oracledb, I don't have any options right now that are more performant anyway. And what I've got now works, and it's fast enough for my needs, and it allows my apps' code to be all async. So, no hurry on my end, I'll just stay tuned for updates. Let me know if there's anything I can do to help (I'm an experienced Python dev, but I'm an Oracle n00b). I appreciate the work you're putting into this library, keep it up.

nickswiss commented 1 year ago

I am getting a bit confused following along here on whether this community has found an async approach with this library. I am also struggling to understand the future of this package as far as asyncio is concerned.

As others have mentioned, the rapidly popularizing FastAPI framework is async based and without this support the web server ends up using threadpools to faciliate any route which is not async (synchronous cx_oracle/alchemy). I have been anxiously following along this thread for some time but it seems this is not getting the priority I had hoped for when this functionality was originally proposed. Certainly appreciate the work and communication here, I just know that this is a gap that is stopping a good deal of the applications I support from making the switch, and at this point is leading to considering databases which have supporting async libraries.

anthony-tuininga commented 1 year ago

@nickswiss, others have indeed come up with a "solution" that works for them (the aforementioned cx_Oracle_async) while waiting for me to implement asyncio support. Support is indeed planned and some progress has been made, but other higher priority items have interrupted that progress. There are a few small items remaining on that list but I hope to have those completed shortly (which are planned to go into 1.4) and then I can concentrate on asyncio support (which is planned for 2.0). Of course these plans are subject to change but we will inform you if that is the case.

Part of my efforts in the past few months have been on support for a thin mode driver for Node.js. That effort has shown that it is possible to implement a truly async model without the enhancemnts available in Oracle Database 23c -- so with that knowledge I hope to have asyncio support for all database versions, with the caveat that the Oracle Database 23c enhancement should improve performance. The performance without the Oracle Database 23c enhancement should still easily outperform the simple wrapper approach that cx_Oracle_async is using.

The current plan is for asyncio support to only be usable in thin mode.

WilliamStam commented 1 year ago

thank you for all your hard work! its super supper appreciated.

im also a +1 for asyncio natively to get the proper awaits in it. in the meen time. anyone have some code they could share to get it going? :P cant exactly block the gil while waiting for an oracle response. is it as easy as:

def make_oracle_call(sql, params):
    with oracle as e: # psudo code
        results = e.fetchall()

    return results

executor = ThreadPoolExecutor(max_workers=1)
a = executor.submit(functools.partial(make_oracle_call,sql,params))
Julian-Brendel commented 1 year ago

Hi All,

Just wanted to check in on this thread.

Do you have any update on the roadmap / plans for the async / 2.0 release?

cjbj commented 1 year ago

@Julian-Brendel some work is being done and management is aware we are treating this as a priority request, however other things do keep coming up so I'm not going to comment on timelines.

anthony-tuininga commented 12 months ago

See announcement #258 for details. Comments welcome here!

old-syniex commented 12 months ago

@anthony-tuininga I am suggestingto release version 2.0.0a1 for ease of testing purposes.

I propose aligning our oracledb design with the following asyncpg methods: fetch fetchrow fetchval execute executemany

Adopting this approach would likely enhance the overall user-friendliness.

anthony-tuininga commented 12 months ago

@old-syniex, I presume you are referring to having these methods on the connection object -- thereby eliminating the need to create a cursor object and perform an execute/fetch on that? I agree that these would be more convenient in cases where the additional properties available on cursors are not required. Would you like to see this as an additional set of APIs? Or as a complete replacement?

As for creating a release, that is certainly a possibility which we will consider.

old-syniex commented 12 months ago

@anthony-tuininga Yeah, having those methods on the connection object. I would like it to be additional set of API, I can't find a reason why to remove the current API.

anthony-tuininga commented 12 months ago

Sure. I can see the advantage of doing that. Something like this would make sense to me:

async def execute(
    self,
    statement: str,
    parameters: Union[list, tuple, dict] = None
) -> Any:
    """
    Executes a statement and returns an AsyncCursor instance (if executing a query) or None if
    executing a non-query. Other options include returning the number of rows updated
    (for non-queries) or an ExecuteResult object which contains the information that would be on
    an AsyncCursor instance.
    """

async def executemany(
    self,
    statement: str,
    parameters: Union[list, int],
    batcherrors: bool = False,
    arraydmlrowcounts: bool = False
) -> None:
    """
    Similar to AsyncCursor.executemany() but doesn't require creating an AsyncCursor instance first.
    No return value but could also have an ExecuteResult object returned with information that would
    normally be an AsyncCursor instance.
    """

async def fetchone(
    self,
    statement: str,
    parameters: Union[list, tuple, dict] = None,
    rowfactory: Callable = None
) -> Any:
    """
    Executes a statement and returns the first row of the result set returned
    (or None, if no rows are fetched).
    """

async def fetchmany(
    self,
    statement: str,
    parameters: Union[list, tuple, dict] = None,
    num_rows: int = oracledb.defaults.arraysize,
    rowfactory: Callable = None
) -> list:
    """
    Executes a statement and returns the first <num_rows> rows of the result set.
    """

async def fetchall(
    self,
    statement: str,
    parameters: Union[list, tuple, dict] = None,
    arraysize: int = oracledb.defaults.arraysize,
    rowfactory: Callable = None
) -> list:
    """
    Executes a statement and returns all of the rows of the result set as a list.
    """

That makes it clear that they are the same as the cursor equivalents but without requiring a cursor. Thoughts?

syniex commented 12 months ago

Yes, this looks great.

anthony-tuininga commented 12 months ago

FYI, I just pushed changes to merge with the changes introduced in main as well as ensure that Python 3.7 and higher work correctly. I had inadvertently used a method that was only available in Python 3.11!

anthony-tuininga commented 11 months ago

FYI, I just pushed more changes to merge with the changes introduced in main and also added shortcut functions on the connection object as follows:

    async def callfunc(
        self,
        name: str,
        return_type: Any,
        parameters: Union[list, tuple] = None,
        keyword_parameters: dict = None,
    ) -> Any:
        """
        Call a function with the given name.

        This is a shortcut for creating a cursor, calling the stored function
        with the cursor and then closing the cursor.
        """

    async def callproc(
        self,
        name: str,
        parameters: Union[list, tuple] = None,
        keyword_parameters: dict = None,
    ) -> list:
        """
        Call a procedure with the given name.

        This is a shortcut for creating a cursor, calling the stored procedure
        with the cursor and then closing the cursor.
        """
    async def execute(
        self, statement: str, parameters: Union[list, tuple, dict] = None
    ) -> None:
        """
        Execute a statement against the database.

        This is a shortcut for creating a cursor, executing a statement with
        the cursor and then closing the cursor.
        """
    async def executemany(
        self, statement: Union[str, None], parameters: Union[list, int]
    ) -> None:
        """
        Prepare a statement for execution against a database and then execute
        it against all parameter mappings or sequences found in the sequence
        parameters.

        This is a shortcut for creating a cursor, calling executemany() on the
        cursor and then closing the cursor.
        """
    async def fetchall(
        self,
        statement: str,
        parameters: Union[list, tuple, dict] = None,
        arraysize: int = None,
        rowfactory: Callable = None,
    ) -> list:
        """
        Executes a query and returns all of the rows. After the rows are
        fetched, the cursor is closed.
        """

    async def fetchmany(
        self,
        statement: str,
        parameters: Union[list, tuple, dict] = None,
        num_rows: int = None,
        rowfactory: Callable = None,
    ) -> list:
        """
        Executes a query and returns up to the specified number of rows. After
        the rows are fetched, the cursor is closed.
        """
    async def fetchone(
        self,
        statement: str,
        parameters: Union[list, tuple, dict] = None,
        rowfactory: Callable = None,
    ) -> Any:
        """
        Executes a query and returns the first row of the result set if one
        exists (or None if no rows exist). After the row is fetched the cursor
        is closed.
        """
WilliamStam commented 11 months ago

on behalf of pretty much everyone. thank you soooo much for this.

cjbj commented 11 months ago

@WilliamStam thank you. Please make sure you hammer on it and give us feedback!

cjbj commented 11 months ago

How is everyone going with python-oracledb asyncio testing? Are there any issues that we should know about, or suggestions that you want to make?

syniex commented 11 months ago

@cjbj is there any chance to create a prerelease?