jetty / jetty.project

Eclipse Jetty® - Web Container & Clients - supports HTTP/2, HTTP/1.1, HTTP/1.0, websocket, servlets, and more
https://eclipse.dev/jetty
Other
3.86k stars 1.91k forks source link

Define HttpSession behaviors during WebSocket LifeCycle #3981

Closed joakime closed 2 years ago

joakime commented 5 years ago

In light of some past discussions ...

We should think about how HttpSession behaves while in a WebSocket lifecycle.

Currently ...

janbartel commented 5 years ago

Just to add some info on this issue wrt changes for #3913: I can confirm that the HttpSession is still usable during the point in time the websocket is being established (specifically I tested accessing the HttpSession during the call to Configurator.modifyHandshake). Thus, our current recommended solution of accessing the session and retrieving all values you want to use from it is still valid post #3913 changes.

rfox12 commented 2 years ago

I have a question (which I think is relevant in this context--and hopefully for others that might be searching for answers). Joakim assembled an excellent set of questions. My perception is that the standards did not keep pace with the needs of developers in this particular instance.

So, with that in mind... is there a recommended approach for developers who need to build a Jetty server to share server state between normal "REST" type of servlets and web sockets? How are people solving this today?

My initial thought is:

  1. Send the session ID to the web socket endpoint upon creation in the protocol upgrade request
  2. Build a custom SessionHandler / SessionCache to expose current state of sessions to my web socket endpoint (using the session ID).

I haven't thought through all of the implications (e.g. session ID that changes); but I wanted to get an expert opinion. Would you go down this track? Do something else? I greatly appreciate your time and Jetty! -Robert

joakime commented 2 years ago

A common pattern that has emerged is that the application controls it's server state tracking in a way that doesn't rely on servlet HttpSession. (this works for REST, WebSocket, grpc, etc...)

HTTP uses Cookie to set an application state tracking id of it's own. Something that is not HttpSession.

The HTTP State is obtained from this cookie id. The WebSocket connection is passed this state id on initialization, and when it needs to access the state it uses the application specific techniques to access the state.

There are many libs for Java available to do this already.

rfox12 commented 2 years ago

There are many libs for Java available to do this already.

I don't know of any other libraries (besides Spring, which I'm not using)

From my perspective we've already taken on some technical debt in order to support HttpSessions (session handler, session cache, and a persistence layer, and more!) -- so logically I'm wondering why wouldn't we re-use some of this investment? The session mostly does what I need (store a bit of data).

In concrete terms for Jetty, is there any reason you would advise against simply passing a reference to my servlet environment's instance of DefaultSessionCache to a WebSocketListener implementation? As far as I can tell, DefaultSessionCache is thread safe. I would just need to handle lifecycle issues myself (which I'm fine with).

Edit: seems I would need to pass my SessionHandler (rather than the SessionCache in order to control session persistence)

joakime commented 2 years ago

The thing is, HttpSession is for HTTP.

WebSocket is not HTTP. (It's websocket)

It just so happens that one way to initiate WebSocket on HTTP/1.1 using the HTTP upgrade process. HTTP/2 and HTTP/3 have a different process (not using an HTTP request to upgrade).

The only point in time where HTTP is involved in WebSocket is during the upgrade request. Once the connection is upgraded, you are no longer HTTP.

Expecting HttpSession to exist after the upgrade request concludes and the connection is moved on from HTTP is not valid.

rfox12 commented 2 years ago

I understand. But just because it's named "HttpSession", does that disqualify it from being used in a more general sense to store small pieces of connection / user related information for websocket? These are the things I like about existing (yes "HTTP" sessions...):

And...

So why would I throw all of that away and start from scratch just because I'm in a WebSocket? That doesn't make any sense to me. Why wouldn't we generalize Session to be a more abstract concept that can be used regardless of protocol?

joakime commented 2 years ago

You know that javax.websocket and it's replacement jakarta.websocket don't actually have a dependency on the servlet spec, right? That's an intentional decision by the spec.

Why? Well, that's because at it's heart websocket is not HTTP and is not connected to the Servlet spec. Other technologies within Jakarta EE (eg: CDI) do not even acknowledge that WebSocket is built on HTTP or Servlet. There's even implementations of the jakarta.websocket spec without using the Servlet spec. And quite a few implementations of jakarta.websocket at the jakarta.servlet level, not the jakarta.servlet.http level (this level has HttpSession, the other one doesn't).

The only safe thing for you is to access what you need from the HttpSession during websocket upgrade and not hold onto the HttpSession object after websocket upgrade. That object, that you fetch from the HttpSession, is what you need to use to coordinate between your HTTP scoped handling (REST) and anything that isn't HTTP (WebSocket, protobuf, grpc, etc.). You have no other safe mechanism, now, or in the foreseeable future. The entirety of the Jakarta EE spec group has pretty much decided that there is no HttpSession lifecycle post HTTP Upgrade (be it websocket, or any other upgraded connection). You must assume that the HttpSession you see during WebSocket upgrade will be useless once the upgrade itself is complete.

Now lets talk about your points ...

  • They are part of a well known and battle hardened architecture that has been around for decades
  • Jetty's default implementation will produce a nicely random session ID for me

If you have a client that starts with a WebSocket (no prior HTTP calls), this fails. This behavior only applies to normal HTTP requests.

  • Servlet implementation can automatically create a client cookie for me

WebSocket upgrade will never create a client cookie (it cannot, and most browsers will reject Set-Cookie headers seen during any HTTP upgrade).

  • HttpSessions can be created easily in my authentication process (which is servlet based)

Depending on your choice of WebSocket technology, authentication is not applied during WebSocket upgrade. This depends on how WebSocket upgrade is handled (as first step like on javax.websocket native support, or as a filter, or as a servlet, or as a handler). It's actually quite easy to have a WebSocket server that upgrades BEFORE authentication is applied. The only safe thing here is to validate that the request is sane on upgrade yourself and reject if things are not right (eg: does it have an HttpSession, a request.getUserPrincipal, a request.getRemoteUser, a request.getAuthType, etc) There is also NO support for closing an active websocket connection that has had auth revoked/logout (this must be handled by the websocket application, not the container). Your application likely requires a prior HTTP exchange before WebSocket is initiated. The container cannot validate or enforce that rule for you, only you and your application can.

  • HttpSessions can store any POJO (setAttribute in Jetty)

Only POJOs that are properly written to allow serialization. POJO deserialization is tricky, especially on HttpSession that no longer belongs to an active HTTP connection or scoped Request/Response objects (which WebSocket doesn't have). The classloader in this case is very messed up, and the longer you wait to access the content, the more likely it will fail.

  • I already have a persistence mechanism in place to support Sessions (in my case MongoDB)
  • Caching has already been thought through
  • I've already solved for clustering and concurrency issues

For WebSocket, this would require your clustering solution to push changes into long lived HttpSession object instances being held by WebSocket endpoints. HttpSession doesn't work that way. HttpSession pulls from data store on request, and pushes back to data store on completion of response. (caching could delay that push, but there's a whole host of dirty tracking, timestamps on objects, to know what is the most "current" data to write to the data store) For WebSocket upgrade, that's a pull on request, and push on upgrade success. That HttpSession instance is essentially "done" and stale once the upgrade has succeeded.

  • Probably a lot more that I'm leaving out (e.g. Jetty HouseKeeper)

Actions of HTTP Session HouseKeeper have no impact on active WebSocket connections. That can only be handled by the application. An example would be that the application has ...

  • It handles my core need -- to store a little bit of state per remote

This layer was created for HTTP, using HTTP concepts, to help with short-lived request/response exchanges (eg: HTTP), never for long duration connections.

There's a reason why in the past 11 years (WebSocket is that old now) more and more groups have been moving away from relying on HttpSession, as the technologies around the web increase to be more and more non-HTTP. The old concepts built around HTTP no longer serve the greater needs of the modern complex multi-protocol application (your use of WebSocket makes your application now a multi-protocol application).

If you look at most modern libraries that support both Servlet and WebSocket you'll quickly see that the work is being done by the library for auth / security / state, not the container or the servlet spec anymore. (spark has it's own layers for auth/state, spring too, google cloud as well, along with many many others). You are now in the same world of complexity as these libraries are. Welcome to the modern era of internet applications.

rfox12 commented 2 years ago

You fill your posts with information / facts; but you've completely missed the heart of the discussion. Sessions are useful. In my usage of Jetty I use them today. I want to share that state with Websockets--and have Websockets contribute / modify that state. How would you do it?

You've waved your hands at "other libraries" but you've failed tell me simply: what would you do? What would you recommend Jetty users do? And it seems to me your answer is: start from scratch.

joakime commented 2 years ago

The overwhelming majority of Servlet + WebSocket (Jetty and Tomcat) users that need data from the HttpSession object have the following behaviors and restrictions. (Pointed out this in a previous message already, so I'm stating it again, but in a different way)

I know I sound like a broken record, but I've tried to point out how HttpSession fits into the world of servlets and Jakarta EE. That scope of usefulness: only Http, and in the case of WebSocket over HTTP/1.1, only during the HTTP/1.1 upgrade. The use of HttpSession in WebSocket outside of that narrow scope has been unsupported by the entire Jakarta EE community for the past 11 years. Servlet 3.1 was the first version to contemplate it. Servlet 4.0, Javax WebSocket 1.0, and Java EE 8, tried to address it, but it was awkward and complicated and chose not to address it. Servlet 5.0, Jakarta WebSocket 1.1, and Jakarta EE 9, decided that HttpSession during an active WebSocket has no solution, as its incompatible and not ever going to work reliably. Servlet 6.0, Jakarta WebSocket 2.0, and Jakarta EE 10, have decided to no pursue this any more. That's where we are right now. The best proposed solutions within Jakarta EE all don't use HttpSession, but something else, which is what every other library build on Servlet and WebSocket do too, and what is expected of your application/implementation as well.

We, Jetty, cannot invent this for just Jetty, as the minute we do we break other specs that have defined behaviors around HttpSession, and greatly complicate the entire HttpSession layer for all usages to attempt to support it. This level of change to HttpSession will never occur in Jetty 9, Jetty 10, or Jetty 11 (it's far too invasive). A non-servlet solution of state tracking that works regardless of protocol or spec is not on our radar, but if it would be, that would only exist for Jetty 12+ (in active development).

joakime commented 2 years ago

When this issue was created, we were discussing with other specs in Jakarta EE to see if a solution could be found. This issue was Jetty's effort at pointing out the scope of the problem and what needed to be addressed.

gregw commented 2 years ago

@rfox12 Sorry that the answer to you appartently simple question is so complex.

To answer your question "What would you recommend Jetty users do" is to avoid trying to use sessions as a distributed database. Their semantics is so poorly defined that it is hard to keep them coherent with updates from simultaneous requests to the same node, let alone if you add in distributed nodes and/or other protocols.

Sessions can be useful is storing immutable keys to other data, or perhaps a cache of read-only data (or snapshot) taken from a database. But they are not suitable for general purpose distributed/concurrent storage of rich mutable objects.

Websocket end-points only have a relationship with a HttpSession as they are created. Other than that, websockets cannot influence sessions in any way to: a) keep them resident in memory; b) mark them as dirty so that mutations are written out; c) touch them so that they do not timeout.

So my reecommendation is that if you have distributed data, then put it in a proper distributed database form which there are many to choose from. By all means put keys and or cached copies of data in your HttpSession, but don't try to mutate that and expect any coherent results... even on a single node. If your session data is like that, then your websocket end-points can also grab copies of those keys and/or caches from the session as they are created. But they need to interact directly with the real distributed database if you wish to do any mutations.

Having said all that, there are users that do get away with accessing session from websocket endpoints via references. It may be that their session datastore and/or session cache are configured in such a way that the access patterns work. It may be that there is enough HTTP traffic along side the websocket to keep the session alive. It may be that they mutations of session data jsut happen to be safe or fail so infrequently that it is not noticeable. So there may well be some specific circumstances under which you can do a lot better than "start from scratch". But in general we don't recommend using sessions for mutable data.

Yes it is a sorry state of affairs. Sessions suck!

rfox12 commented 2 years ago

This level of change to HttpSession will never occur in Jetty 9, Jetty 10, or Jetty 11 (it's far too invasive). A non-servlet solution of state tracking that works regardless of protocol or spec is not on our radar, but if it would be, that would only exist for Jetty 12+ (in active development).

This is informative.

So my reecommendation is that if you have distributed data, then put it in a proper distributed database form which there are many to choose from.

Indeed. I have one--namely MongoDB. Thank you both for your responses. I didn't mean to come off as belligerent; I was simply / honestly trying to see if there was some expert-recommended path. Where there is a will there is a way; and in the space of a couple of days I can hack around these issues. Here is my tentative plan (for what it's worth...)

1) I'm going to create a PassThroughSessionCache class that extends AbstractSessionCache. It will never retain sessions in memory. It will persist them immediately to my custom implementation of MongoSessionDataStore (which leverages newer drivers and MongoDB's POJO persistence). I understand the limitations of transactions and eventual consistency in MongoDB (and deal with those issues holistically today).

2) I'm going to share the session ID (which is guaranteed / required to be created in my workflow) with my WebSocketListener on upgrade.

3) I'm going to somewhat abuse the scope of Jetty's SessionHandler and inject it into my WebSocketListener class as well. This will allow me to extend / modify / destroy or otherwise manipulate the session that was associated with the initial upgrade.

I will have to read some more code; but as far as I can tell the parts that I want to use are thread safe. Small side note: I'm somewhat surprised that the httpOnly for cookies defaults to false. That seems less secure by default.

I don't expect you to condone this. Others are certainly solving for this in their own ways. I do not envy the "standards bearers" of legacy J2EE. I think I've taken enough of your time (thank you!). Last note: I encourage you to lead with regards to meeting the needs of developers. Over the 25 years I've been hacking I've noticed that solutions that adhere to standards weren't the ones that won. It was the solutions that met developer needs that won out. Maybe it's better that sessions are out of scope in Jetty? Maybe there is a smart way to incorporate a new / more abstract concept of session? Not sure. But I picked Jetty because the code base "smells good" to me. It looks to be well thought out and extensible in the right places. And reliable so far! Grateful to have it. -Robert

gregw commented 2 years ago

Thank you both for your responses. I didn't mean to come off as belligerent;

:)

For our part, we didn't mean to come off as exaspirated. But as the "standards bearers" of legacy J2EE, I kind of think sessions are in the too broken to fix camp. I don't think they can be dragged to one interpretation or another without breaking a huge number of applications, as too many have used querks of various contains or a priori knowledge of the specific session stores to make workable applications, but that are just not portable. It is probably time for a new abstraction and to just deprecate sessions... maybe servlet 7.0?

The intention of jetty is to be both a standards implementation, but also to be a flexible code base that doesn't "smell too bad" to be used (and "abused") in specific use-cases that are not covered by the standard. If you have need, then it is indeed reasonable to extend session handling APIs. So long as you know it is not standard!

cheers

janbartel commented 2 years ago

I'm going to create a PassThroughSessionCache class that extends AbstractSessionCache. It will never retain sessions in memory. It will persist them immediately to my custom implementation of MongoSessionDataStore (which leverages newer drivers and MongoDB's POJO persistence). I understand the limitations of transactions and eventual consistency in MongoDB (and deal with those issues holistically today).

You might want to just use the NullSessionCache - it never caches sessions, but will call the SessionDataStore as normal. Also, we'd be pleased to accept a PR for an update of the MongoSessionDataStore to newer versions/api :)

Not sure what you're trying to do will ultimately work, but you know your application best. Just make sure you've read all the doco on HttpSessions: https://www.eclipse.org/jetty/documentation/jetty-10/programming-guide/index.html#pg-server-session
https://www.eclipse.org/jetty/documentation/jetty-10/operations-guide/index.html#og-session-usecases

rfox12 commented 2 years ago

You might want to just use the NullSessionCache - it never caches sessions, but will call the SessionDataStore as normal.

Thanks Jan but I noticed that the eviction policy for NullSessionCache is EVICT_ON_SESSION_EXIT... which I don't think meets my needs. I still need long lived sessions that can time out with inactivity. Also a small note here... strictly speaking NullSessionCache doesn't fill the contract of AbstractSessionCache (in 11.0.7). I found myself overriding the parent delete method rather than doDelete, otherwise it seems that some session events would not be triggered upstream (in SessionHandler).

Also, we'd be pleased to accept a PR for an update of the MongoSessionDataStore to newer versions/api :)

Yep I'd be willing to contribute this code. :) It's not really a change of the existing MongoSessionDataStore. It's a simpler one I made from scratch. Let me know where you want me to drop this.

janbartel commented 2 years ago

Thanks Jan but I noticed that the eviction policy for NullSessionCache is EVICT_ON_SESSION_EXIT... which I don't think meets my needs. I still need long lived sessions that can time out with inactivity. Also a small note here... strictly speaking NullSessionCache doesn't fill the contract of AbstractSessionCache (in 11.0.7). I found myself overriding the parent delete method rather than doDelete, otherwise it seems that some session events would not be triggered upstream (in SessionHandler).

Any of the eviction settings are effectively meaningless for the NullSessionCache - as there is never a session inside the cache, it therefore cannot be evicted - it just looks better as the default than any of the others ;)

Note that you can still have long-lived sessions that aren't necessarily in the cache. They live inside the SessionDataStore until such time as the scavenger passes over them and finds those that have not been used within the expiry limit.

NullSessionCache fulfills the contract of AbstractSessionCache.doDelete() - it is only expected that the subclass will remove the session from the cache. As the session is never in the NullSessionCache to begin with, it is effectively a no-op. This does not affect any of the other behaviour of AbstractSessionCache with calling listeners etc etc.

Yep I'd be willing to contribute this code. :) It's not really a change of the existing MongoSessionDataStore. It's a simpler one I made from scratch. Let me know where you want me to drop this.

Actually we would need an update of the existing MongoSessionDataStore rather than a completely different class implementation for backwards compatibility and regression reasons.Of course, if you have been able to retain the behaviour and backward compatibility with a "simpler" implementation it might be interesting to take a look at...

rfox12 commented 2 years ago

If you look at this line of code, you can see that, if the session to remove is not found in cache, the session destroyed events will not get called.

This line of code shows you that EVICT_ON_SESSION_EXIT will cause "resident" flag to get toggled with every request (which I could ignore...)

This line of code shows how EVICT_ON_SESSION_EXIT is used in the calculation of timeout (calculateInactivityTimeout), when maxInactive > 0 ... which in my case it is. I still want them to get removed from my data store after a long period of inactivity.

janbartel commented 2 years ago

If you look at this line of code, you can see that, if the session to remove is not found in cache, the session destroyed events will not get called.

It is a little more complicated than you think ;). If you look at the code for Session, you will see that if Session.invalidate() is called, indeed all of the listeners are called correctly. Then, we tell all of the SessionHandlers in the various contexts to invalidate and remove their sessions with the same id. It is at this point that the code you refer to is called. When called for the SessionHandler that is the parent of the Session, we can just remove it, because all of the listeners will already have been called. If this is a SessionHandler in a different context, then we need to call listeners and remove from the cache. In other words, this code is taking account of cross context dispatch.

This line of code shows you that EVICT_ON_SESSION_EXIT will cause "resident" flag to get toggled with every request (which I could ignore...)

Yup, in the case of the NullSessionCache, a session that cannot not live past a finished request does have this setter called.

This line of code shows how EVICT_ON_SESSION_EXIT is used in the calculation of timeout (calculateInactivityTimeout), when maxInactive > 0 ... which in my case it is. I still want them to get removed from my data store after a long period of inactivity.

I think you're mixing concepts here. If you don't want to cache sessions, then the idle timeout is unnecessary - the session cannot live past the end of the request. Note that it will still continue to exist in persistent store, there just isn't an in-memory representation of it. In this case, the scavenger thread will find sessions in the persistent store whose expiry time has past, inflate them into memory (if possible) and call all the listeners on them.

rfox12 commented 2 years ago

Thank you @janbartel ! My last question for now: Is there a way to dynamically change the SessionCache strategy?

@joakime Please feel free to cleanup this issue if you want. Certainly got into the weeds.

janbartel commented 2 years ago

@rfox12 not sure what you mean by the strategy? If you mean the eviction setting, then no. There can be multiple threads in the cache at one time so you would get indeterminate results. I should probably add some specific code to reject calling the setters after start. On any given Session you can change the max idle time in a thread-safe way.