pgjdbc / pgadba

Implementation of Java 10 sql2 spec for PostgreSQL
BSD 2-Clause "Simplified" License
70 stars 10 forks source link

Initializing one Connection cost ~300MB heap utilization and it reaches ~600+MB when I have 10 connections #16

Open sssinghsyr opened 6 years ago

sssinghsyr commented 6 years ago

I was comparing blocking JDBC with this async-pgsql2 API.

Design: Blocking JDBC - 1000 threads making connection with the DbServer. In this case, heap utilization was <100 MB.

Async-pgsql2 - 1000 async sql-submission using 10 different connections. Heap utilization reaches >600MB.

Any suggestion for the improvement in the design. I guess, whole pgsql2 will work efficiently when we will re-use connection, do we have ConnectionPool feature available here? Are we going to have?

How can I delete connection? Each connection initialize multiple containers which eats huge heap space.

davecramer commented 6 years ago

can you elaborate on reuse connection? I presume you are aware that connections cannot be shared ?

sssinghsyr commented 6 years ago

I have Database connector service build using this async-pgsql2 API which connects with the DbServer. 1000 Clients make connections with the connector service and it routes sql to the DbServer. Having more connections eat lot of heap space.

Connections cannot be shared mean? I can use same connection to submit different type of submission (different sql query).

alexanderkjall commented 6 years ago

I think I will need to do a bit of profiling in order to have an informed opinion on how to do this better. Do you have any code to share on how you did your test?

My uninformed opinion is that the ByteBuffer usage in the network communication layer have potential to be optimized, we don't clear and reuse them.

Regarding connection pools, we don't have that available, but it's clear that it's needed in order to use the driver efficiently, same as the normal jdbc driver. But it should maybe be it's own project? Building a connection pool have it's own challenges and I guess that it should be database agnostic

davecramer commented 6 years ago

Regarding connection pools, we don't have that available, but it's clear that it's needed in order to use the driver efficiently, same as the normal jdbc driver. But it should maybe be it's own project? Building a connection pool have it's own challenges and I guess that it should be database agnostic.

Wow, this is not a light undertaking. There are many good poolers out there and they are significant projects in their own right. I would advise strongly that this is way out of scope

alexanderkjall commented 6 years ago

I totally agree, I tend to insert maybe's when I'm still thinking stuff through, will try to communicate more clearly.

sssinghsyr commented 6 years ago

@alexanderkjall This is my project: async-db-connector. I am testing with 1000 clients sending queries in parallel. Async-db-connector uses IOmultiplexing to receive all queries with single thread and call async-pgsql2 method multipleRowOperation using newly created CONNECTION. Problem: Each client query will require new CONNECTION and it will be added into DataSource.connections LinkedQueue. There is no method to remove it from the list and delete its resources once CONNECTION is used. All the connections keep being added and there will be unnecessary iteration of all those. This restrict me to re-use each connection to limit extra overhead of their creation.

Please correct me with the understanding of CONNECTION. Also, why I cannot re-use CONNECTION? @davecramer

davecramer commented 6 years ago

@sssinghsyr you can re-use connections. What you can't do is use the same connection across threads concurrently

alexanderkjall commented 6 years ago

@sssinghsyr Thanks, I'll look into it and run it through a profiler, it sounds to me like there might be a lingering reference somewhere that doesn't get cleaned up when the connection is closed.

sssinghsyr commented 6 years ago

@alexanderkjall Could you please help me to setup this project's test suite? Cannot I run the testcases without docker? Any setup link will be preferable!

alexanderkjall commented 6 years ago

@sssinghsyr how to install docker greatly depends on what OS you are running, but maybe this can be a start: https://docs.docker.com/install/

Regarding the profiler it turns out to be a bit trickier to run visualvm with java 10 than it was with java 8, I haven't had time to set that up yet.

sagenschneider commented 6 years ago

I would like to get in #18 to see if this is still an issue. The current thread running 100% creates a ByteBuffer each loop of 1Kb. This can quickly fill the heap space.

Question: are there OutOfMemoryExceptions? if not likely just a growing heap size due to this.

sssinghsyr commented 6 years ago

@sagenschneider There was no OutOfMemoryExceptions. I was comparing application memory usage between JDBC and ADBA.

sagenschneider commented 6 years ago

@sssinghsyr Yes, likely cause. I'm near finishing the Selector work. It is running for my NioLoop tests. I'm just trying to get it running for remaining tests before merging in to complete #19 PR.

Early look is available here https://github.com/sagenschneider/pgsql2/tree/NetworkLayer/src/main/java/org/postgresql/sql2/communication/network (Note: I'm looking to read/write to ByteBuffers to avoid unnecessary copies of data and object creation).

With Selector and reading from re-used Direct ByteBuffers, it will consume a little more memory than the JDBC driver (though mostly in direct memory space and not heap). This, however, should not be in the 100's of megabytes (and potentially can be shared with HTTP handling ByteBuffers to reduce unused buffers in pools).