dvarrazzo commented 1 year ago

Profiling the current state of Psycopg (3.1.5), in a tight loop, profiling looks like:

The ragged hill is the machinery to convert parameters from Python to Postgres. It is probably the best area to attack in order to improve performances, probably even more valuable, and less intrusive, than #415).

Rewriting this part in Cython would also allow to communicate in C between query and Transformer, so probably there is no need to resurface to Python in a PostgresQuery.convert() call (except to call Python dumpers of course).

https://github.com/psycopg/psycopg/blob/1edf0eb47d44b985e82a60cf1ce4fd2a0c1fdafd/psycopg/psycopg/cursor.py#L469-L470

jmallad commented 1 year ago

Hey Daniele, I'm Josh Mallad. Josh Drake from Command Prompt sent me here - he is potentially willing to sponsor me for this work. I'm trying to wrap my head around this codebase, and I have some questions. Is there a relevant psycopg development community you would prefer to direct me to (IRC channel, mailing list, etc.) or should I contact you directly?

dvarrazzo commented 1 year ago

Hello Josh, nice to meet you 🙂

Large part of the development happens on this tracker. You are welcome to ask questions on this issue, if it's relevant to the task, or to open a discussion if it's something more generic.

If you don't feel comfortable asking on a public channel, feel free to write to me directly. You can find my email in the git commits.

Good luck!

jmallad commented 1 year ago

How can I reproduce the results you are seeing with your profiling? What test code are you using? I guess the first thing I want to do is try profiling different types to see which ones are hot spots. My guess would be strings and datetime objects, but I can't know for sure yet.

dvarrazzo commented 1 year ago

A graph like the one above can be created using py-spy.

I have used the command line:

py-spy record --native -r 500 -o myout.svg -- \
    python tests/scripts/bench-411.py psycopg --ntests 50000 --no-drop --no-create \
    --dsn="host=localhost dbname=psycopg3_test sslmode=disable"

the script runs a script with a minimal tight loop (the name comes from issue #411) - just a micro-benchmark - and saves the result as a flame graph in myout.svg. In the command line above:

--native allows to profile both C and Python in the same trace
-r 500 is the sampling rate. More would probably not add more resolution - the profile would just trip over itself.
--no-create and --no-drop refer to the test data created by bench-411.py. The first time you run the script you can't use --no-create. If you use --no-drop, then you can use --no-create and --no-drop to leave the test data in the database so that creation/deletion don't affect the benchmark.
In the connection string, I have used host=localhost because I wanted to test waiting on the network, not of Unix sockets, and sslmode=disable to avoid to see the encryption time in the benchmark.

And yes, of course different data types would take more or less for conversions. All the data types loaders and dumpers have a Python implementation (in psycopg/psycopg/types), some of them have a faster equivalent in C (in psycopg_c/psycopg_c/types). The basic ones (strings, numbers, datetime) are implemented already. Others are added progressively: faster arrays loaders were added in 3.1.5 (see #359). Others are probably a good thing to add, such as uuid (see #447), others are probably not worth the hassle (network types, composites: not used so often to be relevant).

jmallad commented 1 year ago

Awesome, thanks so much. I think I've got everything I need to start hacking on this now.

jmallad commented 1 year ago

Getting back into a normal rhythm now that the holidays are over. I'm currently working on converting _queries.py to Cython - I will have something to push for review before the week is over.

jmallad commented 1 year ago

I've got the new _queries.pyx building now. You can see it here. I'm now trying to find spots in the annotated HTML where it can be further optimized. I've just seen issue #446 - is this a potentially worthwhile path to explore for this issue? I notice it works with the type annotations, many of which I had to remove to get Cython to compile.

dvarrazzo commented 1 year ago

Thank you very much, that's good to hear! I'll take a look.

Using mypyc, as per #446, might get some speedup, but, as far as I understand, mypyc is not designed to interface with C code.

The algorithms used to parse the queries should benefit of having their inner part written in c using c structures rather than python lists and dicts, so I'd keep them in cython with an eye of pushing them to pure c.

The parts of the adapter showing up in profiling after the query manipulation are the check of whether the procedure should be stored and setting the results to the cursor. They are comparatively less meaningful to optimise, and they are at close contact with python objects, so there's little that can be made into pure c I think and less to gain in writing handcrafted Cython code

jmallad commented 1 year ago

Hey, can someone let me know if I'm on the right track here?

My brain is fried from figuring out Cython, but I think I am starting to get it. _query2pg has a lot of list usage, so I thought I'd start by implementing a fast C list structure. This way, I can replace the slower Python lists with faster C structures.

This isn't so straightforward though; particularly when you get to exceptions. My C list calls malloc() a few times, allocating resources which must later be freed. But, Python could throw an exception in the middle of the C code, messing everything up.

One idea I have is containing the C list inside of a Python object, so it can have a destructor (del) method. Does this sound like it would work?

dvarrazzo commented 1 year ago

Hello! Thank you for working on it! I will review the code later today.

dvarrazzo commented 1 year ago

@jmallad I have started taking a look at your code. However, can you make a pull request, so I can share a review? Thank you!

jmallad commented 1 year ago

psycopg / psycopg

Optimize query conversion in C #456

485