nextcloud / spreed

🗨️ Nextcloud Talk – chat, video & audio calls for Nextcloud
https://nextcloud.com/talk
GNU Affero General Public License v3.0
1.62k stars 434 forks source link

Participants dropped out of call after session.gc_maxlifetime #12449

Open pschichtel opened 4 months ago

pschichtel commented 4 months ago

How to use GitHub


Steps to reproduce

  1. Install nextcloud with redis as session_save_handler with following recommended settings:
    php_value[session.save_handler]          = redis
    php_value[session.save_path]             = tcp://redis:6379
    php_value[redis.session.locking_enabled] = 1
    php_value[redis.session.lock_retries]    = -1
    php_value[redis.session.lock_wait_time]  = 10000
  2. start a call (2+ participants, guests or logged in users does not matter)
  3. wait for your session to expire and the session garbage collector to execute (at least 24 minutes by default). To accelerate this the gc_maxlifetime can be reduced and the gc_probability can be set to 100.

Expected behaviour

The expectation is clear: The call should continue uninterrupted.

Actual behaviour

Users are redirected to the .../not-found page (for logged in users) or the login page (for guests) after their session expired (24 minutes by default), their session got garbage collected and spreed attempts to fetch something from the room.

My setup uses the nextcloud/helm chart with the dedicated nginx and healthchecks, so the requests to /status.php greatly accelerate the session garbage collection (even though its probability is the default 1%).

Either sessions should be renewed/refreshed before expiry or spreed should be able to continue with a new session. Logged in users should be able to connect to the room again and also guests should be able to reconnect to rooms that don't require passwords, both without interruption and user interaction.

Eventually a request to .../v1/chat/... results in a 412 Precondition Failed response (which btw is a abuse of the 412 code), which includes a set-cookie response header that sets the session ID cookie, but interestingly with the same value as before. After that, a final request to .../v3/signaling/... results in a 404 Not Found response and then the redirect to .../not-found happens. For guests that .../not-found redirect results in another redirect to the login page, since they apparently don't have the permission to see the not-found page.

This is similar to: https://github.com/nextcloud/spreed/issues/4670

Talk app

Talk app version: 19.0.1 (this is not a new issue though)

Custom Signaling server configured: no

Custom TURN server configured: yes

Custom STUN server configured: yes

Browser

Microphone available: yes

Camera available: no

Operating system: Arch / Windows

Browser name: Firefox/Chrome/Chromium

Browser version: latest versions as of today.

Browser log

Nothing of interest

Server configuration

Operating system: docker.io/library/nextcloud:29.0.1-fpm-alpine

Web server: Nginx

Database: PostgreSQL

PHP version: 8.2

Nextcloud Version: 29.0.1

List of activated apps:

``` If you have access to your command line run e.g.: sudo -u www-data php occ app:list from within your server installation folder ```

Nextcloud configuration:

``` If you have access to your command line run e.g.: sudo -u www-data php occ config:list system from within your Nextcloud installation folder ```

Server log (data/nextcloud.log)

``` Insert your server log here ```
pschichtel commented 4 months ago

I've just tried to configure session.gc_probability to 0, so the session GC never executes. PhpRedis properly sets and resets (on access) a expiry time on the redis keys, making PHP's session GC obsolete. Access any page of the nextcloud properly resets the key expiry time, but spreed's requests don't seem to do that.

nickvergessen commented 4 months ago

Works very fine for multiple hour calls on our server with the following configuration:

redis.session.lock_expire => 0 => 0
redis.session.lock_retries => -1 => -1
redis.session.lock_wait_time => 10000 => 10000
redis.session.locking_enabled => 1 => 1
session.gc_divisor => 1000 => 1000
session.gc_maxlifetime => 1440 => 1440
session.gc_probability => 1 => 1
session.use_cookies => 1 => 1
session.use_only_cookies => 1 => 1
session.use_strict_mode => 0 => 0
pschichtel commented 4 months ago

I'll deploy these later today together with the 29.0.2 release and get back to you

pschichtel commented 4 months ago

@nickvergessen these were pretty much exactly my values, with two exceptions:

option my value your value
session.gc_probability 0 1
session.gc_divisor 100 1000

So you have a 0.1% probability of triggering session GC and I have a 0% probability. So if anything, my sessions should never be cleaned up.

I'm running my nextcloud distirbuted (2 replicas configured in the nextcloud/helm chart). I'm wondering if it's somehow related to that.

pschichtel commented 4 months ago

@nickvergessen I've gotten in touch with the PhpRedis people to confirm the behavior of PHP's session GC: It's completely ignored. So what ever you have configured for the session.gc_* stuff (except gc_maxlifetime), it's a no-op. Source

I'm pretty much out of ideas on what to check here. I did poke around a bit in the PHP code in an attempt to get the sessions to properly refresh, but I wasn't able to get anything to work. Any ideas?

pschichtel commented 4 months ago

@nickvergessen do you by any chance have redis.session.early_refresh enabled in your PHP configuration? That looks like it could help.

pschichtel commented 4 months ago

Interestingly it helped with a logged in user, but not with a guest.

nickvergessen commented 3 months ago

No we don't have that. here is the full grab:

# php --info | grep session
igbinary session support => yes
redis.session.lock_expire => 0 => 0
redis.session.lock_retries => -1 => -1
redis.session.lock_wait_time => 10000 => 10000
redis.session.locking_enabled => 1 => 1
session
session.auto_start => Off => Off
session.cache_expire => 180 => 180
session.cache_limiter => nocache => nocache
session.cookie_domain => no value => no value
session.cookie_httponly => no value => no value
session.cookie_lifetime => 0 => 0
session.cookie_path => / => /
session.cookie_samesite => no value => no value
session.cookie_secure => 0 => 0
session.gc_divisor => 1000 => 1000
session.gc_maxlifetime => 1440 => 1440
session.gc_probability => 1 => 1
session.lazy_write => On => On
session.name => PHPSESSID => PHPSESSID
session.referer_check => no value => no value
session.save_handler => redis => redis
session.save_path => tcp://… => tcp://…
session.serialize_handler => php => php
session.sid_bits_per_character => 4 => 4
session.sid_length => 32 => 32
session.upload_progress.cleanup => On => On
session.upload_progress.enabled => On => On
session.upload_progress.freq => 1% => 1%
session.upload_progress.min_freq => 1 => 1
session.upload_progress.name => PHP_SESSION_UPLOAD_PROGRESS => PHP_SESSION_UPLOAD_PROGRESS
session.upload_progress.prefix => upload_progress_ => upload_progress_
session.use_cookies => 1 => 1
session.use_only_cookies => 1 => 1
session.use_strict_mode => 0 => 0
session.use_trans_sid => 0 => 0
session.trans_sid_hosts => no value => no value
session.trans_sid_tags => a=href,area=href,frame=src,form= => a=href,area=href,frame=src,form=

That being said, we don't have any of the session config options enabled in Nextcloud. Did you add any of those?

pschichtel commented 3 months ago

that's the full PHP config currently for reference:

![image](https://github.com/nextcloud/spreed/assets/526104/b39fe9c4-e0de-47d6-9581-bf7d70a8923b)

there doesn't seem anything out of the ordinary.

I checked the config.php for the options you mentioned:

what I realized when grepping through the file: I completely forgot to mention that I have the oidc app installed and in use.

One option of that app caught my eye:

I'll test with that option disabled to be sure

pschichtel commented 3 months ago

Ok I just tested with the oidc app disabled and 2 guests (normal and private session in another browser) in the call, both got redirected to login right after the 24 minutes.

pschichtel commented 3 months ago

Do you see anything else that might trigger this?

Enabled:
  - activity: 2.21.1
  - calendar: 4.7.6
  - circles: 29.0.0-dev
  - cloud_federation_api: 1.12.0
  - comments: 1.19.0
  - contacts: 6.0.0
  - contactsinteraction: 1.10.0
  - dav: 1.30.1
  - deck: 1.13.0
  - drawio: 3.0.2
  - federatedfilesharing: 1.19.0
  - federation: 1.19.0
  - files: 2.1.0
  - files_downloadlimit: 2.0.0
  - files_pdfviewer: 2.10.0
  - files_reminders: 1.2.0
  - files_sharing: 1.21.0
  - files_trashbin: 1.19.0
  - files_versions: 1.22.0
  - firstrunwizard: 2.18.0
  - forms: 4.2.4
  - groupfolders: 17.0.1
  - lookup_server_connector: 1.17.0
  - nextcloud_announcements: 1.18.0
  - notes: 4.10.0
  - notifications: 2.17.0
  - notify_push: 0.6.12
  - oauth2: 1.17.0
  - photos: 2.5.0
  - privacy: 1.13.0
  - provisioning_api: 1.19.0
  - recommendations: 2.1.0
  - related_resources: 1.4.0
  - serverinfo: 1.19.0
  - settings: 1.12.0
  - sharebymail: 1.19.0
  - spreed: 19.0.1
  - systemtags: 1.19.0
  - tasks: 0.16.0
  - text: 3.10.0
  - theming: 2.4.0
  - twofactor_backupcodes: 1.18.0
  - updatenotification: 1.19.1
  - user_status: 1.9.0
  - viewer: 2.3.0
  - workflowengine: 2.11.0
Disabled:
  - admin_audit: 1.19.0
  - bruteforcesettings: 2.9.0
  - dashboard: 7.9.0 (installed 7.8.0)
  - encryption: 2.17.0
  - files_external: 1.21.0
  - logreader: 2.14.0 (installed 2.13.0)
  - oidc_login: 3.1.1 (installed 3.1.1)
  - password_policy: 1.19.0 (installed 1.18.0)
  - support: 1.12.0 (installed 1.11.0)
  - survey_client: 1.17.0 (installed 1.16.0)
  - suspicious_login: 7.0.0
  - twofactor_totp: 11.0.0-dev
  - user_ldap: 1.20.0
  - weather_status: 1.9.0 (installed 1.8.0)

I think I'll try disabling notify_push next, just in case.

pschichtel commented 3 months ago

Same issue with notify_push disabled

pschichtel commented 3 months ago

@nickvergessen would you be able and willing to check on your instance if the TTLs of the session keys in redis are being reset during a call? For me they never reset.

pschichtel commented 3 months ago

Still happening on 19.0.2

pschichtel commented 3 months ago

Still happening on NC 29.0.2 and spreed 19.0.3, still no clue why it's happening

nickvergessen commented 3 months ago

willing to check on your instance if the TTLs of the session keys in redis are being reset during a call? For me they never reset.

I fear I lack the knowledge to do so.

Still happening on 19.0.2 Stell happening on NC 29.0.2 and spreed 19.0.3

Would be a surprise if it would have helped, as nothing related was changed.

pschichtel commented 3 months ago

I fear I lack the knowledge to do so.

if you know your session id (can be extracted from cookies) you can look for the redis key with KEYS *sessionid* (where sessionid would be your specific id). Once you have the full key you can run TTL key, that will tell you the remaining TTL of the key

pschichtel commented 2 months ago

I just retested this on latest nextcloud 29.0.4 and spreed 19.0.7 and its still happening. I have still no clue why this is happening... https://..../call/... if you want to see for your self. Just open it in two browsers, set a timer to 24 minutes and wait for the login page to show.