googleapis / go-gorm-spanner

Google Cloud Spanner implementation for Go's GORM library.
Apache License 2.0
29 stars 4 forks source link

I want a precise understanding of sessions. #10

Closed sjy-dv closed 9 months ago

sjy-dv commented 1 year ago

I fully understand that one gRPC channel can handle up to 100 sessions.

If a single Google Spanner node can perform 1000 concurrent queries, does this mean that one node is equivalent to 1000 sessions?

Spanner, unlike traditional databases, uses the concept of sessions rather than connections. However, due to similar mechanisms like session pools and connection pools, it's often confusing whether sessions and connections are the same thing. Is a session the same as a connection?

olavloite commented 9 months ago

Sessions in Cloud Spanner and connections in traditional relational databases are different, however they do have some similarities. The best way to view it is like this:

  1. A connection in a traditional relational database combines both the physical connection (e.g. TCP connection) between the client and the server, and the session state (e.g. which database am I connected to, am I currently executing a transaction, is the connection in read-only mode, etc.)
  2. Cloud Spanner uses gRPC for communication between the client and the server. For this, gRPC uses 'channels', which are roughly equivalent to TCP connections. One gRPC channel can carry multiple requests (up to 100) at the same time. gRPC channels do not have any 'session state' or anything like that. They are oblivious to what kind of requests are being sent and received.
  3. Cloud Spanner uses sessions to maintain session state (e.g. which database is being used, what role does the current user have), and to execute transactions. One session can have at most one (read/write) transaction at any time.

So as a rule of thumb, you should have as many sessions in your session pool as the number of concurrent transactions that your client will execute in parallel. Note that a transaction in this case in also includes read-only transactions, as the Spanner client will use one session from the pool for each transaction that it executes.

You write that a single Cloud Spanner node can execute 1,000 queries, but that number is incorrect. The correct number is 22,500. See https://cloud.google.com/spanner/docs/performance#increased-throughput. Note that the maximum throughput is in an ideal case, and that in the real world with an application that is executing both reads and writes at the same time, and where not all reads are simple reads selecting a single row using the primary key, the number will be lower.

Sessions are unrelated to the number of Cloud Spanner nodes. You should determine the number of sessions that you need in your client based on what your client will be doing. A couple of extreme examples:

  1. You have a client that is running a simple monitoring job that is polling data single threaded using a query that is executed once a second: This client does not need more than 1 session. (But for simplicity, you can keep the session pool configuration unchanged, as the extra sessions won't hurt performance anywhere)
  2. You have a client that handles incoming web requests, and this client can handle up to 2,000 web requests in parallel. Each web request will also need to execute transactions on Cloud Spanner. These requests also start a second goroutine that also write some log data to Cloud Spanner, meaning that each request will have two goroutines writing to Cloud Spanner at the same time. The theoretical maximum number of parallel transactions that your application can have is 4,000, so you should have 4,000 sessions in your session pool.