pallets-eco / flask-session

Server side session extension for Flask
https://flask-session.readthedocs.io
BSD 3-Clause "New" or "Revised" License
488 stars 236 forks source link

Allow static resources to bypass opening sessions #254

Open markhobson opened 1 month ago

markhobson commented 1 month ago

Requests for static resources currently cause a session to be opened, which can be costly as the session expiry date is updated in the storage backend. Consider adding a configuration key to bypass opening sessions for static resources.

A workaround is to use a custom session interface, for example:

class StaticRequestFilteringSessionInterface(SessionInterface):
    def __init__(self, app):
        self._delegate = app.session_interface
        self._exclude_path_prefix = app.static_url_path + "/"

    def open_session(self, app, request):
        if request.path.startswith(self._exclude_path_prefix):
            return self.make_null_session(app)

        return self._delegate.open_session(app, request)

    def save_session(self, app, session, response):
        return self._delegate.save_session(app, session, response)

Configured with:

from flask_session import Session

...
Session(app)
app.session_interface = StaticRequestFilteringSessionInterface(app)
Lxstr commented 1 month ago

Hi @markhobson thanks for the issue. In the latest version there should not be anything set to storage unless you are modifying the session or have SESSION_REFRESH_EACH_REQUEST set to true. Can you confirm you aren't setting the session at all like was occurring in #232 ?

Are you using a proxy or hosting service to cache your static files? That would normally preclude this issue.

However, regarding not opening the session at all, this is an interesting proposal if there is a use case. I wonder if there is a better way though than providing urls, which would add complexity for a developer, rather than simplifying. Something that would only trigger on read or write to session object.

One idea is re-working how flask works with sessions. Rather than opening a session with each request, the session is filled with an empty session object and the object itself calls the open_session method by overwriting the get/set method and checking if previously accessed or modified. This could be worth experimenting, but it's a pretty fundamental change.

markhobson commented 3 weeks ago

Hi @Lxstr, thanks for the considered reply. I'm not setting the session in each request but I was unaware of SESSION_REFRESH_EACH_REQUEST defaulting to true. That's a subtle Flask default - it'd be helpful to clarify this behaviour in the Flask-Session docs.

Turning that flag off, it does indeed no longer update the session in the storage backend. It does still open the session on every request which queries the database. The reason I'm trying to minimise hits to the database is to work around #251 - this is an issue running locally for development when using an in-memory SQLite database, hence why a proxy wouldn't be useful here.

Is what you're proposing here a lazy session? One that does nothing until it is get or set. This could be implemented as a decorating SessionInterface and Session, a bit like my StaticRequestFilteringSessionInterface example above. As you say, that could avoid the need to specify details such as request URLs to filter.

You might also be interested in a few other calls to avoid opening a session that I read:

markhobson commented 3 weeks ago

I had a go at the lazy session pattern. This seems to do the trick:

from typing import Any, Callable, Iterator

from flask import Flask, Request, Response
from flask.sessions import SessionInterface, SessionMixin

class LazySessionInterface(SessionInterface):
    def __init__(self, delegate: SessionInterface):
        self._delegate = delegate

    def open_session(self, app: Flask, request: Request) -> SessionMixin | None:
        def open_session_delegate() -> SessionMixin | None:
            return self._delegate.open_session(app, request)

        return LazySession(open_session_delegate)

    def save_session(self, app: Flask, session: SessionMixin, response: Response) -> None:
        if not isinstance(session, LazySession):
            raise TypeError(f"session must be a LazySession: {session}")

        if not session.is_open:
            return

        return self._delegate.save_session(app, session.delegate, response)

class LazySession(SessionMixin):
    def __init__(self, open_session: Callable[[], SessionMixin | None]):
        self._open_session = open_session
        self._delegate: SessionMixin | None = None

    def __setitem__(self, key: str, value: Any) -> None:
        return self.delegate.__setitem__(key, value)

    def __delitem__(self, key: str) -> None:
        return self.delegate.__delitem__(key)

    def __getitem__(self, key: str) -> Any:
        return self.delegate.__getitem__(key)

    def __len__(self) -> int:
        return self.delegate.__len__()

    def __iter__(self) -> Iterator[str]:
        return self.delegate.__iter__()

    @property
    def delegate(self) -> SessionMixin:
        if self._delegate is None:
            self._delegate = self._open_session()

        return self._delegate

    @property
    def is_open(self) -> bool:
        return self._delegate is not None

You can try it out in markhobson/flask-session-issue from #251 with:

...
def create_app():
    ...
    Session(app)
    app.session_interface = LazySessionInterface(app.session_interface)
    ...

Before these changes, putting breakpoints on SqlAlchemySessionInterface._retrieve_session_data and _upsert_session and hitting the endpoint twice results in:

retrieve_session_data
upsert_session
127.0.0.1 - - [05/Jun/2024 08:58:29] "GET / HTTP/1.1" 200 -
retrieve_session_data
retrieve_session_data
retrieve_session_data
upsert_session
upsert_session
upsert_session
127.0.0.1 - - [05/Jun/2024 08:58:29] "GET /static/main3.css HTTP/1.1" 304 -
127.0.0.1 - - [05/Jun/2024 08:58:29] "GET /static/main2.css HTTP/1.1" 304 -
127.0.0.1 - - [05/Jun/2024 08:58:29] "GET /static/main1.css HTTP/1.1" 304 -

retrieve_session_data
upsert_session
127.0.0.1 - - [05/Jun/2024 08:58:33] "GET / HTTP/1.1" 200 -
retrieve_session_data
retrieve_session_data
retrieve_session_data
upsert_session
upsert_session
upsert_session
127.0.0.1 - - [05/Jun/2024 08:58:33] "GET /static/main3.css HTTP/1.1" 304 -
127.0.0.1 - - [05/Jun/2024 08:58:33] "GET /static/main1.css HTTP/1.1" 304 -
127.0.0.1 - - [05/Jun/2024 08:58:33] "GET /static/main2.css HTTP/1.1" 304 -

Using LazySessionInterface then eliminates retrieve and upserting the session for the static resources, as they don't use the session:

retrieve_session_data
upsert_session
127.0.0.1 - - [05/Jun/2024 08:57:59] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [05/Jun/2024 08:57:59] "GET /static/main2.css HTTP/1.1" 304 -
127.0.0.1 - - [05/Jun/2024 08:57:59] "GET /static/main1.css HTTP/1.1" 304 -
127.0.0.1 - - [05/Jun/2024 08:57:59] "GET /static/main3.css HTTP/1.1" 304 -

retrieve_session_data
upsert_session
127.0.0.1 - - [05/Jun/2024 08:58:01] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [05/Jun/2024 08:58:01] "GET /static/main2.css HTTP/1.1" 304 -
127.0.0.1 - - [05/Jun/2024 08:58:01] "GET /static/main3.css HTTP/1.1" 304 -
127.0.0.1 - - [05/Jun/2024 08:58:01] "GET /static/main1.css HTTP/1.1" 304 -

Finally, combining that with SESSION_REFRESH_EACH_REQUEST=False avoids the unnecessary upsert on the second request:

retrieve_session_data
upsert_session
127.0.0.1 - - [05/Jun/2024 08:57:14] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [05/Jun/2024 08:57:14] "GET /static/main2.css HTTP/1.1" 304 -
127.0.0.1 - - [05/Jun/2024 08:57:14] "GET /static/main1.css HTTP/1.1" 304 -
127.0.0.1 - - [05/Jun/2024 08:57:14] "GET /static/main3.css HTTP/1.1" 304 -

retrieve_session_data
127.0.0.1 - - [05/Jun/2024 08:57:41] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [05/Jun/2024 08:57:41] "GET /static/main1.css HTTP/1.1" 304 -
127.0.0.1 - - [05/Jun/2024 08:57:41] "GET /static/main2.css HTTP/1.1" 304 -
127.0.0.1 - - [05/Jun/2024 08:57:41] "GET /static/main3.css HTTP/1.1" 304 -
Lxstr commented 3 weeks ago

Hi @markhobson thanks for the great effort on this. I'm really liking it. I still I have a few PRs to merge but this could be good to add for 1.0. I can see that there would be use for this.

We still would have some implementation details to hash out and ask the core team about it. How would it be best to implement, with a flag or built in directly?

There's some downside to each. On one hand adding more flags means there's a lot of settings and have to change defaults to get the ideal performance. However, directly built in might be too large a change even for a major bump. A third option might be replacing SESSION_REFRESH_EACH_REQUEST with SESSION_LAZY and result in either your first or your final example.

@ThiefMaster @davidism I'd appreciate your thoughts on this one. Is this code a valid/readable approach? Could you see any unintended side effects?

davidism commented 3 weeks ago

You can find my thoughts in https://github.com/pallets/flask/issues/5491. The author didn't really address any of it. You brought up the same main point: static files should be served by the HTTP server, at which point this is no longer needed.

In production, if performance is an issue, you want to serve your static files directly through your HTTP server, not through Flask. Then this doesn't apply anyway. In other cases, you may want to serve static files with other conditions applied, at which point it's not clear sessions should always be excluded. I don't think it's worth adding the complexity of implementation and explanation to Flask itself, especially when it's already possible to write a custom session (a completely supported and intended public API) to do whatever you want for your case.