snowflakedb / snowflake-connector-python

Snowflake Connector for Python
https://pypi.python.org/pypi/snowflake-connector-python/
Apache License 2.0
583 stars 468 forks source link

support for asyncio #38

Open jpassaro opened 6 years ago

jpassaro commented 6 years ago

The Snowflake connector for python seems to be implemented essentially as API calls over HTTP. Using aiohttp, companion subclasses to SnowflakeConnector, SnowflakeCursor, SnowflakeRestful etc, could be created that implement the key methods as asynchronous coroutines. Then asyncio tools could be used to run Snowflake connection routines alongside other I/O-centric or API-driven tasks.

Has this been considered? Is it a viable addition to the Snowflake Connector? If so I'm happy to contribute, I'd love to hear any requirements you folks might have in mind. Or on the other hand, is it more appropriate as a fork, or as a separate project altogether in the style of aiobotocore?

smtakeda commented 6 years ago

Thanks for suggestion. Yes, we considered aiohttp as well as any async feature in 3.4+. There are two concerns: 1) since the driver needs to support both 2.7+ and 3.4+, it may need to have a branch for 3.4+, 2) it is not clear how to handle OCSP check in aiohttp.

OCSP check is a requirement from our security team, though the standard Python's SSL library doesn't support it (and even most other clients don't care about the certificate revocation status!). We use a monkey patch on the top of https://github.com/requests/requests to intercept SSL handshake and add the OCSP checks.

So definitely all contribution would be pleased along with the above concerns addressed.

szelenka commented 5 years ago

Checking if there has been any progress with asyncio support for this package, the last update seems to be over 2 years ago..

smtakeda commented 5 years ago

sorry we didn't have enough bandwidth to work on this. A plan is add async support after dropping python2.

krrg commented 5 years ago

Python 2 will stop being supported by the Python Foundation on Jan 1, 2020. Will Snowflake also drop support at that time? Just curious if any timeline has been established on when Snowflake will EOL Python 2 support.

If it helps anyone interested, as a temporary workaround, I've been running Snowflake queries inside a concurrent.futures.ThreadPoolExecutor using what basically amounts to await asyncio.get_event_loop().run_in_executor(...).

This has been just fine for my purposes (internal reporting) and allows me to still use the other neat asyncio features while not blocking the main thread.

smtakeda commented 5 years ago

For Python 2 support, feel free to discuss in #107. We have not decided when we are going to drop Python 2 support yet. Good new is the tread of Python 2 download is down: https://pypistats.org/packages/snowflake-connector-python

We'll keep eyes on the usage metrics on our end to determine the timing of Python2 drop.

smtakeda commented 5 years ago

We want this for Snowflake: https://github.com/MagicStack/asyncpg https://github.com/aio-libs/aiomysql

farvour commented 4 years ago

What's the progress on aio support? Anything that uses it, such as sqlalchemy-aio cannot take advantage of this dialect+driver without it...

smtakeda commented 4 years ago

Not much progress. In the planning meeting.

mvoitko commented 4 years ago

Thanks for suggestion. Yes, we considered aiohttp as well as any async feature in 3.4+. There are two concerns: 1) since the driver needs to support both 2.7+ and 3.4+, it may need to have a branch for 3.4+, 2) it is not clear how to handle OCSP check in aiohttp.

Python 2.7 will be deprecated since Jan 1 2020. ANy progress on async SnowFlake connector?

islobodch commented 4 years ago

Commenting here to re-emphasise the issue. We are going to use asyncio in ongoing projects involving Python, and not having support for async is a bit frustrating. Are there any news regarding this? Thanks.

jimfang commented 4 years ago

It is very important for snowflake to support the async connector. Otherwise, we have to workaround. It is a big bottleneck for performance optimization. Thanks.

samstiyer commented 3 years ago

Hey Everyone, just wanted to bump this feature request. With FastAPI becoming more and more utilized in the python ecosystem, support for an Asyncio continues to become more important. Thanks!

madhukar01 commented 3 years ago

Bump!! We need asyncio support :)

whardier commented 3 years ago

Please implement pgsql wire protocol for Snowflake. Hard must.

krrg commented 3 years ago

I think what Snowflake would really benefit from is some sort of publicly documented HTTP/REST API. It would be a lot easier to build an asyncio community library on top of something that is documented than trying to reverse engineer it or come up with sub-optimal solutions.

whardier commented 3 years ago

Snowflake is the primary database for a project I am working on. It has been an absolute struggle dealing with all of the gotchas. Having some visibility would be a good idea - however I am going to assume the security team would struggle with this. Definitely sold to our CTO as very much compatible with existing workflows, tech stacks, etc... and showed a lot of promise by offering an sqlalchemy driver.

I have been mostly using usql (https://github.com/xo/usql) to fill in a lot of gaps with the web ui, the almost unusable cli offered by snowflake, and to act as a system call when I need something done async without having to deal with busted multiprocessing solutions.

dennis-weyland-by commented 3 years ago

Next bump. Building a FastAPI application which needs to perform sync and async queries would be much better if the snowflake connector would support asyncio. Yes the sqlalchemy connector is supporting this but lacking snowflake exclusive features like execute_async.

@keller00 are there any updates?

sfc-gh-mkeller commented 3 years ago

I'm sorry, but this is not planned for anytime soon. Our codebase is built upon using urllib3 and other dependencies that use it under the hood (boto3 comes to my mind immediately). We also monkey patch our own OCSP verification into urllib3 for extra security. Last time I checked urllib3 said that they will not support asyncio ever, so to support it we'd need a complete rewrite of the library, which we have tried, but the benchmarks didn't live up to our standards unfortunately.

However; we do as of recently support our own Async execution feature, see documentation here: https://docs.snowflake.com/en/user-guide/python-connector-example.html#label-python-connector-asynchronous-query-examples I hope that this could be useful for some of you!

whardier commented 3 years ago

While the async feature is neat.. it does not offer async socket support.

Perhaps the team at snowflake can look into using httpx as a base.

On Thu, Mar 18, 2021 at 10:05 PM Mark Keller @.***> wrote:

I'm sorry, but this is not planned for anytime soon. Our codebase is built upon using urllib3 and other dependencies that use it under the hood (boto3 comes to my mind immediately). We also monkey patch our own OCSP verification into urllib3 for extra security. Last time I checked urllib3 said that they will not support asyncio ever, so to support it we'd need a complete rewrite of the library, which we have tried, but the benchmarks didn't live up to our standards unfortunately.

However; we do as of recently support our own Async execution feature, see documentation here: https://docs.snowflake.com/en/user-guide/python-connector-example.html#label-python-connector-asynchronous-query-examples I hope that this could be useful for some of you!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/snowflakedb/snowflake-connector-python/issues/38#issuecomment-802581703, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKRFPLDZ7QPXZYNR3SGXTTELSRLANCNFSM4EAS4B5A .

--

[image: --]

Shane R. Spencer [image: https://]about.me/ShaneSpencer https://about.me/ShaneSpencer?promo=email_sig

allenhumphreys commented 3 years ago

Why support python at all if you can't invest the time and effort to support modern Python?

Lexicality commented 3 years ago

Async Python is Future Python, not Modern Python. A lot of widely used libraries are not (yet) async compatible and most Python running in the real world is not async. Don't be silly.

allenhumphreys commented 3 years ago

Hi, I'm from the future! 🛸

whardier commented 3 years ago

Aww sweet! I am not the only modern time traveler.

liorchen commented 2 years ago

this is a MUST we use snowflake on top of async python to provide customer facing dashboards and it basically slows down our service dramatically there's no new python project that starts without asyncio please prioritize this!

shahargl commented 2 years ago

Any updates?

samstiyer commented 2 years ago

Bump

liorchen commented 2 years ago

hi guys! for what its worth I've hacked together a nodejs server that I use as proxy with python's aiohttp it works very well https://github.com/varos-io/snowflake-proxy/

awm33 commented 2 years ago

+1 This is pretty critical for us, our entire backend is python asyncio

Looking at writing a threading based async/await wrapper similar to https://github.com/aio-libs/aioodbc

dapollak commented 2 years ago

any updates ?

kchojn commented 2 years ago

Hey, I'm from the future 👾, we live on Mars, ETH is the main currency, we resurrected dinosaurs. But there is still no support for the async SF connector...

Any updates? 👀

awm33 commented 2 years ago

What we did in our codebase (https://aquifer.cloud)). It's not as nice as native support but aioodbc does essentially the same thing, and could get some people over this hump.

import asyncio
from functools import partial

import snowflake.connector

class SnowflakeCursor:
    def __init__(self, connection):
        self._connection = connection
        self._cur = None

    async def __aenter__(self):
        self._cur = await self._connection._execute(self._connection._cursor)
        return self

    async def __aexit__(self, exception_type, exception_value, traceback):
        await self.close()

    @property
    def description(self):
        return self._cur.description

    @property
    def rowcount(self):
        return self._cur.rowcount

    @property
    def arraysize(self):
        return self._cur.arraysize

    @arraysize.setter
    def arraysize(self, size):
        self._cur.arraysize = size

    async def close(self):
        if self._connection is None:
            return
        await self._run_operation(self._cur.close)
        self._connection = None

    async def execute(self, sql, *args, **kwargs):
        await self._run_operation(self._cur.execute, sql, *args, **kwargs)
        return self

    async def fetchall(self):
        return await self._run_operation(self._cur.fetchall)

    async def fetchmany(self, size):
        return await self._run_operation(self._cur.fetchmany, size)

    async def nextset(self):
        return await self._run_operation(self._cur.nextset)

    async def _run_operation(self, func, *args, **kwargs):
        return await self._connection._execute(func, *args, **kwargs)

class SnowflakeConnection:
    def __init__(self, pool, config, loop=None):
        self._pool = pool
        self._config = config
        self._loop = loop or asyncio.get_event_loop()
        self._connection = None

    async def __aenter__(self):
        if not self._connection:
            await self.connect()
        return self

    async def __aexit__(self, exception_type, exception_value, traceback):
        pass

    def cursor(self):
        return Cursor(self)

    async def close(self):
        if not self._connection:
            return
        connection = await self._execute(self._connection.close)
        self._connection = None
        return connection

    @property
    def closed(self):
        return self._connection.is_closed

    async def connect(self):
        self._connection = await self._execute(
            snowflake.connector.connect,
            autocommit=False,
            account=self._config['account'],
            user=self._config['username'],
            password=self._config['password'],
            database=self._config['database'],
            warehouse=self._config.get('warehouse'),
            role=self._config.get('role')
        )

    @property
    def _cursor(self):
        return self._connection.cursor

    def _execute(self, func, *args, **kwargs):
        func = partial(func, *args, **kwargs)
        return self._loop.run_in_executor(None, func)
liorchen commented 2 years ago

@awm33 unless I'm missing something, this will have no impact on performance comparing to the snowflake official version. that's because the underlaying snowflake driver is still using python native sockets which is blocking the thread it is executed on.

awm33 commented 2 years ago

@liorchen It's using the threaded executor to keep it from blocking. aioodbc and motor (asyncio mongodb client) use the same approach https://github.com/aio-libs/aioodbc/blob/master/aioodbc/connection.py#L78

liorchen commented 2 years ago

@awm33 interesting, thanks. I'd still prefer writing nodejs code before doing threads in python. Especially when using web server like FastAPI. who knows what security vulnerabilities or other hidden issues am I exposing myself into.

I miss gevent..

ashish-gupta1-by commented 2 years ago

I am assuming by python not having proper approach to achieve true async programing with python snowflake driver. I can't see any performance improvments

evanrittenhouse commented 2 years ago

@ashish-gupta1-by I think the reasoning above is running the queries concurrently, not truly asynchronously. This is also what happens in the Node.js event loop, coincedentally, but yeah, Python's GIL really forces a lot of workarounds like this. The concurrency arises from allowing asyncio to handle network I/O, meaning that the event loop's control is passed away from the awaited coroutine when there is some form of programmer-determined I/O behavior occurring.

dennis-wey commented 2 years ago

Since to me this is easily the most frustrating topic about snowflake, I thought I'll describe some use cases:

In general asyncio is a solution to achieve concurrent programming in python. This is different from parallel computing (see here: https://stackoverflow.com/questions/1897993/what-is-the-difference-between-concurrent-programming-and-parallel-programming) That makes asyncio great when waiting for "external" processes like IO, Rest API and other stuff where your cpu is basically just waiting. And that's exactly what our cpu is doing when interacting with snowflake. We send sql str, and then ask the service in intervals if its done computing. So that means that asyncio is not just some new fancy way to program but actually the perfect feature for snowflake since it's made for such use cases.

With that in mind here are some use cases I can think of when dealing with snowflake:

Working with fastapi It's one of the fastest and most popular REST frameworks in python. And it achieves that by utilizing asyncio to the core. Yes you can incorporate also synchronous workflows in your application, but you won't have the same performance/flexibility. So when building REST API with fastapi. If you have sync-only libraries like this here, they always feel cumbersome to deal with, since you have to build your way around with threading.

Airflow 2.X Deferrable Operators In the python community Airflow is maybe the most popular tool when managing your workflows. It handles task execution scheduling and comes with a nice UI. With their new version they introduced Deferrable Operators. So basically you're operator gets a new state "deferred" which is used for wait for external dependency and it's implemented using asyncio. This makes Airflow much more resilient and efficient since you don't need an executor during this state. Rather all your tasks have an associated trigger running in a single event loop waiting for your task to continue. Perfect feature to execute Snowflake queries using airflow if snowflake would support this.

Running 2 independent Snowflake queries at the same time Not kidding just try it. The code it's either way slower than it have to be or ugly to read or both. But let's go through our options:

and how could it look using asyncio? Looking at python3.11 we're getting TaskGroup:

async with asyncio.TaskGroup() as tg:
    task1 = tg.create_task(sf.execute(query1))
    task2 = tg.create_task(sf.execute(query2))

While it's only 3 lines of code it also has a much better and easier way to handle exceptions like any solution currently available.

Summary Asyncio is not a new overhyped way of programming in python. By now, you can't even call it new anymore. But it's actually the perfect feature for snowflake and databases in general, since we don't need local parallelism when the warehouses in the cloud are doing the job for us.

I saw quite some media about snowpark and how you want to appeal more to the python community. Personally, I don't think you can achieve that with your standard python client missing such a crucial feature.

ian-whitestone commented 1 year ago

Can any of the maintainers comment whether this is on the roadmap? I know you can't commit to dates, but at least getting some acknowledgement so we know we're not screaming into a void would be hugely appreciated.

sfc-gh-anugupta commented 1 year ago

Hi All , Thanks all for your inputs and suggestion . Adding support for asyncio is on our roadmap and will keep the thread posted on updates .

layandreas commented 1 year ago

Should definitely be a high priority!

DustinMoriarty commented 1 year ago

We would highly value this as well. Snowflake is highly used by data engineers who, like it or not, are most familiar with python. A lot of times I write code in python simply so that it can be supported by data engineers even though python is far from the most performant language. However, a lot of code is mostly IO waiting for long running tasks. Concurrency for IO is far more important than in memory execution speed for processes which are mostly performing orchestration in a data warehouse. Therefore, python can be nearly as fast as compiled languages for these type of tasks and we have a lot easier time hiring engineers who know python than other languages.

Asyncio in python is now mature and it is well supported by foundational libraries for tasks such as HTTP, TCP and file IO.

Yes, it is possible to run "async" queries where you get a query ID and then check back later to see if the query is done. However, the underlying IO is still blocking. Therefore, it takes a lot of fairly careful design to perform parallel queries efficiently with this pseudo-async approach. You have to run a blocking synchronous loop to get all the query ID's and then another blocking synchronous loop to go check all the queries, skip the completed ones on each pass and return the results once they are all done. I am not even going to try to explain this to most data engineers.

DustinMoriarty commented 1 year ago

@awm33 unless I'm missing something, this will have no impact on performance comparing to the snowflake official version. that's because the underlaying snowflake driver is still using python native sockets which is blocking the thread it is executed on.

I am trying to see if I can use your code. However, I am confused by what you mean about using the threaded executor. It looks to me like you are using self._loop.run_in_executor and self.loop is from asyncio. I think I am missing something. Does partial somehow help you with this? Are you passing in a different pool which uses multithreading instead of asyncio during instantiation of SnowflakeConnection? I am looking for some place where multithreading get's involved. I was going to try to make some similar wrapper with asyncio around a multithreaded sync code.

ian-whitestone commented 9 months ago

Hi All , Thanks all for your inputs and suggestion . Adding support for asyncio is on our roadmap and will keep the thread posted on updates .

Any updates? 🙏

Ousret commented 8 months ago

Hey there,

Niquests seems to be a good way to answer the concerns of both the community and the project maintainers.

awestm commented 5 months ago

@sfc-gh-anugupta @sfc-gh-dszmolka any updates?

sfc-gh-dszmolka commented 5 months ago

Thank you folks for all the feedback and interest! It is too early to give out any estimated timeline, but the team is busy with the planning and design; so there is some progress.

Speaking about which, we'll update this thread when there's any significant new information on the progress. Thank you very much for bearing with us !

sfc-gh-dszmolka commented 3 months ago

Short update to confirm this is still on the roadmap and in progress with the team. No timeline available at this moment - thank you everyone for your patience here.

copdips commented 2 months ago

as snowfkale has already provided standard rest apis, at least for basic CRUD operations, it's technically no more difficult to use it with asyncio by ourselves, instead of waiting for the official aio SDK from snowflake. But yes, a real aio SDK is always better.

sfc-gh-dszmolka commented 2 months ago

short update: internal POC in progress

sfc-gh-dszmolka commented 1 month ago

quick update: as you might have seen from the PRs :) team is actively working on the project and towards to getting out the initial alpha version of the connector which supports async.