Closed maldrake closed 3 years ago
Hi @maldrake
thanks for raising this issue. Session support would be a cool feature indeed. I didn't investigate much the session on protocol, but i can dig more. I think that the session is decoupled from the connection but i might be wrong. I would try to check this first before making design decision. But in case if the session is bounded to the single connection yes option 1 or 2 probably are needed.
I will try to check in other clients too :)
Coming in here to say that I would be very happy to help with the implementation of this feature, this is much needed for our project also.
@wolf4ood have you had anymore clarity looking at other GLV's? I will also do some research over this weekend.
I just quickly glanced at the python GLV and I can see that if sessions are enabled then the client is only allowing a pool size of 1.
I guess this is a pretty good indication to the fact that session must be sent over the same connection.
Hi @jdeepee not yet, I should have some time this weekend. But indeed the python one should give us some indication that the session is bounded to the connection.
Looking into this a little further, I think we're going to need to bind a session to a particular client and connection. According to the Tinkerpop documentation (https://tinkerpop.apache.org/docs/current/reference/#sessions):
If there are multiple Gremlin Server instances, communication from the client to the server must be bound to the server that the session was initialized in. Gremlin Server does not share session state as the transactional context of a Graph is bound to the thread it was initialized in.
More broadly than just Tinkerpop, other graph databases seem to offer options for clustering as well. If someone is doing something like DNS round-robin to load balance connections, then different client instances may be connected to different server instances, which don't share sessions.
Looking back over the two options above, I think my preference would be for option 2: pull the pool outside of the client, so that you have a pool of clients, with each client having a single connection. Then give the client struct three additional functions, to begin, commit, and rollback a transaction. Does that sound like a good way to proceed?
How the python/js and other users of GLV with sessions that limit the pool to one are suppose to use that client over multiple requests? Do the users need to pool manually?
@jdeepee @maldrake I've checked the JS one and actually the client has only one WS connection. So i guess if you use session you need to handle manually the pool by yourself
Agreed. I read the code the same way. This week is turning out to be busier that I expected, but I have some time off from my day job coming up in the next couple weeks. I'll try to get a draft PR together as a starting point for review and discussion.
@wolf4ood,
I'm working on something for which support for sessions would be helpful. I'd like to be able to do multiple queries / traversals within a single transaction, because the latter queries depend on the results returned by earlier queries. I'm happy to do the work to prepare the PR, but first I wanted to ask your thoughts on how I might go about it.
As far as I can tell from the reading the Apache documentation and some of their client code, it looks as though to use sessions:
processor
field set tosession
session
field set to a unique session identifier, recommended to be a UUID to minimize the risk of collisionop
ofclose
to close the session and the transactionIncluding the
processor
andsession
fields in the message looks fairly simple. I think I would add those fields to theMessage
enum
variants and make corresponding changes to the various message_with_args* functions.The part I'm less certain about is changes to
GremlinClient
andaio::GremlinClient
, because of the connection pooling. The connection pool is currently encapsulated within theexecute
andsubmit_traversal
functions, so there's no guarantee that the caller will get the same connection to the server for successive queries or traversals.I couldn't tell for certain from the Apache documentation, so I'm not completely certain that queries and traversals in the same session have to arrive over the same connection. If not, the change is much easier. One approach would allow the addition of session information (i.e. a session ID) to a variant of the
execute
andsubmit_traversal
functions. I would also add aclose_session
function to client to be able to close out the session.My best guess from the documentation is that this won't be enough, though, because the messages in a session must come from the same connection. If that's true:
One option would be to add a separate
SessionedClient
, which operates without a pool in theexecute
andsubmit_traversal
functions. Instead it would have a singleConnection
. I think from the documentation that after a session is closed, the same connection can be reused later either sessionless or with a new session, so one could conceivably have a pool of SessionedClients. I'm not certain, though. The upside is that it doesn't change existingGremlinClient
behavior, although it might be a bit complex to get traversals to work with either an existingGremlinClient
or the newSessionedClient
.Another option would be to restructure the code so that the pool is a pool of multiple
GremlinClient
structs that each have a singleConnection
, rather than a singleGremlinClient
struct with a pool of multipleConnection
structs. This way, the client only has one connection, and the user of the gremlin-client library uses a pool to get aGremlinClient
. The advantage of this approach is that there's no creation of a separate SessionedClient, which probably reduces code duplication and keep things a little simpler inside traversals wheresubmit_traversal
is called. However, it is more of a breaking change for clients that are currently relying onGremlinClient
to manage the connection pool, and it's more complex for users of the library to create and manage the pool.In general, is session support a feature you'd be interested in? If so... you know a lot more about Gremlin, Gremlin-based servers, and gremlin-rs than I do, so does the above sound mostly correct, or did I miss something important? Assuming I'm not too far off, if I work on a PR, do you have a preference about the design direction I go for handling the client pooling issue?
And above all -- many thanks again for this project!