matrix-org / dendrite

Dendrite is a second-generation Matrix homeserver written in Go!
https://matrix-org.github.io/dendrite/
Apache License 2.0
5.75k stars 674 forks source link

failed to connect to room server db, attempt to write a readonly database #1670

Closed MadLittleMods closed 3 years ago

MadLittleMods commented 3 years ago

Related to https://github.com/matrix-org/dendrite/issues/1644

When trying to get started and run the monolith, it throws an error.

$ ./build.sh && ./bin/dendrite-monolith-server --tls-cert server.crt --tls-key server.key --config dendrite.yaml -http-bind-address :8009

INFO[2020-12-23T02:50:44.156262000Z] [github.com/matrix-org/dendrite/setup/base.go:103] NewBaseDendrite
     Dendrite version 0.3.3+a8e947e9.1429-add-healthcheck-endpoint
INFO[2020-12-23T02:50:44.159618000Z] [github.com/matrix-org/dendrite/signingkeyserver/signingkeyserver.go:103] NewInternalAPI
     Enabled perspective key fetcher               num_public_keys=2 server_name=matrix.org
PANI[2020-12-23T02:50:44.162370000Z] [github.com/matrix-org/dendrite/roomserver/roomserver.go:54] NewInternalAPI
-    failed to connect to room server db           error="attempt to write a readonly database"
panic: (*logrus.Entry) 0xc0003073b0

Running with sudo does make it run. But after a follow-up call with @Kegsay and @neilalexander: sudo is not required to run dendrite. The database file needs to be different every time it runs. If sudo works, it's only because it overrides the readonly setting and we are probably losing data.

Potential solutions

Why can't the same database file be used for each run?

MadLittleMods commented 3 years ago

I have switched to using Postgresql with my Dendrite dev setup on macOS 10.15.7

$ brew install postgresql

$ brew services start postgresql
$ brew services info postgresql
$ less +G /usr/local/var/log/postgres.log

$ postgres --version
postgres (PostgreSQL) 13.1

$ createdb dendrite
$ psql dendrite
kegsay commented 3 years ago

Why can't the same database file be used for each run?

Because that's not how SQLite works unfortunately. It's single process, single writer. Sharing the .db file across multiple components will cause failures at best, data loss at worst.

MadLittleMods commented 3 years ago

@Kegsay How does it work across computer restarts? I have to keep my SQLite process running forever for it to work as expected?

neilalexander commented 3 years ago

@MadLittleMods SQLite isn't a separate process, it's embedded into the Dendrite process itself. However, SQLite only allows one writer (irrespective of whether its multiple writers from the same process, or different processes) to a database file at a time. In Dendrite, each component expects to be able to write exclusively to its own database, therefore each component needs to have its own .db file.

MadLittleMods commented 3 years ago

Can we by default split the .db file to it's own pieces so they won't conflict? And re-use it on multiple startups?

I feel like I shouldn't be running into readonly database issues from trying out and starting up Dendrite a couple times.

MadLittleMods commented 3 years ago

This was chatted about a bit more in https://matrix.to/#/!JfeKJVJwCKQNbodoaF:matrix.org/$qPBv5RiIBJ8lzvvLxvxRP44kmWsfo-HvL_5oK0foSDI?via=matrix.org&via=dendrite.neilalexander.dev&via=vector.modular.im

It seems like what I am experiencing isn't expected. "With the more recent dendrite versions with the proper shutdown stuff, that shouldn’t happen". And possibly the lockfile isn't being cleaned up. Where is the lockfile stored so I can check?

neilalexander commented 3 years ago

Can you please confirm that each of the Dendrite components in the dendrite.yaml is connecting to a different SQLite DB file?

MadLittleMods commented 3 years ago

Here is what my dendrite.yaml looks like. I think it's just copied from dendrite-config.yaml. It's switched over to postgres now but still has the comments from SQLite.

Looking through all of the connection_string entries, it looks like account_database and device_database have the same name. But this is the same as dendrite-config.yaml Nevermind, they just looked similar on my first glance. I don't see any duplicates.

dendrite.yaml ```yaml # This is the Dendrite configuration file. # # The configuration is split up into sections - each Dendrite component has a # configuration section, in addition to the "global" section which applies to # all components. # # At a minimum, to get started, you will need to update the settings in the # "global" section for your deployment, and you will need to check that the # database "connection_string" line in each component section is correct. # # Each component with a "database" section can accept the following formats # for "connection_string": # SQLite: file:filename.db # file:///path/to/filename.db # PostgreSQL: postgresql://user:pass@hostname/database?params=... # # SQLite is embedded into Dendrite and therefore no further prerequisites are # needed for the database when using SQLite mode. However, performance with # PostgreSQL is significantly better and recommended for multi-user deployments. # SQLite is typically around 20-30% slower than PostgreSQL when tested with a # small number of users and likely will perform worse still with a higher volume # of users. # # The "max_open_conns" and "max_idle_conns" settings configure the maximum # number of open/idle database connections. The value 0 will use the database # engine default, and a negative value will use unlimited connections. The # "conn_max_lifetime" option controls the maximum length of time a database # connection can be idle in seconds - a negative value is unlimited. # The version of the configuration file. version: 1 # Global Matrix configuration. This configuration applies to all components. global: # The domain name of this homeserver. server_name: my.dendrite.host # The path to the signing private key file, used to sign requests and events. # Note that this is NOT the same private key as used for TLS! To generate a # signing key, use "./bin/generate-keys --private-key matrix_key.pem". private_key: matrix_key.pem # The paths and expiry timestamps (as a UNIX timestamp in millisecond precision) # to old signing private keys that were formerly in use on this domain. These # keys will not be used for federation request or event signing, but will be # provided to any other homeserver that asks when trying to verify old events. # old_private_keys: # - private_key: old_matrix_key.pem # expired_at: 1601024554498 # How long a remote server can cache our server signing key before requesting it # again. Increasing this number will reduce the number of requests made by other # servers for our key but increases the period that a compromised key will be # considered valid by other homeservers. key_validity_period: 168h0m0s # Lists of domains that the server will trust as identity servers to verify third # party identifiers such as phone numbers and email addresses. trusted_third_party_id_servers: - matrix.org - vector.im # Disables federation. Dendrite will not be able to make any outbound HTTP requests # to other servers and the federation API will not be exposed. disable_federation: false # Configuration for Kafka/Naffka. kafka: # List of Kafka broker addresses to connect to. This is not needed if using # Naffka in monolith mode. addresses: - localhost:2181 # The prefix to use for Kafka topic names for this homeserver. Change this only if # you are running more than one Dendrite homeserver on the same Kafka deployment. topic_prefix: Dendrite # Whether to use Naffka instead of Kafka. This is only available in monolith # mode, but means that you can run a single-process server without requiring # Kafka. use_naffka: true # The max size a Kafka message is allowed to use. # You only need to change this value, if you encounter issues with too large messages. # Must be less than/equal to "max.message.bytes" configured in Kafka. # Defaults to 8388608 bytes. # max_message_bytes: 8388608 # Naffka database options. Not required when using Kafka. naffka_database: connection_string: postgresql://localhost/dendrite?sslmode=disable # connection_string: file:naffka.db # max_open_conns: 10 # max_idle_conns: 2 # conn_max_lifetime: -1 # Configuration for Prometheus metric collection. metrics: # Whether or not Prometheus metrics are enabled. enabled: false # HTTP basic authentication to protect access to monitoring. basic_auth: username: metrics password: metrics # Configuration for the Appservice API. app_service_api: internal_api: listen: http://localhost:7777 connect: http://localhost:7777 database: connection_string: postgresql://localhost/dendrite?sslmode=disable # connection_string: file:appservice.db # max_open_conns: 10 # max_idle_conns: 2 # conn_max_lifetime: -1 # Appservice configuration files to load into this homeserver. config_files: [] # Configuration for the Client API. client_api: internal_api: listen: http://localhost:7771 connect: http://localhost:7771 external_api: listen: http://[::]:8071 # Prevents new users from being able to register on this homeserver, except when # using the registration shared secret below. registration_disabled: false # If set, allows registration by anyone who knows the shared secret, regardless of # whether registration is otherwise disabled. registration_shared_secret: "" # Whether to require reCAPTCHA for registration. enable_registration_captcha: false # Settings for ReCAPTCHA. recaptcha_public_key: "" recaptcha_private_key: "" recaptcha_bypass_secret: "" recaptcha_siteverify_api: "" # TURN server information that this homeserver should send to clients. turn: turn_user_lifetime: "" turn_uris: [] turn_shared_secret: "" turn_username: "" turn_password: "" # Settings for rate-limited endpoints. Rate limiting will kick in after the # threshold number of "slots" have been taken by requests from a specific # host. Each "slot" will be released after the cooloff time in milliseconds. rate_limiting: enabled: true threshold: 5 cooloff_ms: 500 # Configuration for the EDU server. edu_server: internal_api: listen: http://localhost:7778 connect: http://localhost:7778 # Configuration for the Federation API. federation_api: internal_api: listen: http://localhost:7772 connect: http://localhost:7772 external_api: listen: http://[::]:8072 # List of paths to X.509 certificates to be used by the external federation listeners. # These certificates will be used to calculate the TLS fingerprints and other servers # will expect the certificate to match these fingerprints. Certificates must be in PEM # format. federation_certificates: [] # Configuration for the Federation Sender. federation_sender: internal_api: listen: http://localhost:7775 connect: http://localhost:7775 database: connection_string: postgresql://localhost/dendrite?sslmode=disable # connection_string: file:federationsender.db # max_open_conns: 10 # max_idle_conns: 2 # conn_max_lifetime: -1 # How many times we will try to resend a failed transaction to a specific server. The # backoff is 2**x seconds, so 1 = 2 seconds, 2 = 4 seconds, 3 = 8 seconds etc. send_max_retries: 16 # Disable the validation of TLS certificates of remote federated homeservers. Do not # enable this option in production as it presents a security risk! disable_tls_validation: false # Use the following proxy server for outbound federation traffic. proxy_outbound: enabled: false protocol: http host: localhost port: 8080 # Configuration for the Key Server (for end-to-end encryption). key_server: internal_api: listen: http://localhost:7779 connect: http://localhost:7779 database: connection_string: postgresql://localhost/dendrite?sslmode=disable # connection_string: file:keyserver.db # max_open_conns: 10 # max_idle_conns: 2 # conn_max_lifetime: -1 # Configuration for the Media API. media_api: internal_api: listen: http://localhost:7774 connect: http://localhost:7774 external_api: listen: http://[::]:8074 database: connection_string: postgresql://localhost/dendrite?sslmode=disable # connection_string: file:mediaapi.db # max_open_conns: 10 # max_idle_conns: 2 # conn_max_lifetime: -1 # Storage path for uploaded media. May be relative or absolute. base_path: ./media_store # The maximum allowed file size (in bytes) for media uploads to this homeserver # (0 = unlimited). max_file_size_bytes: 10485760 # Whether to dynamically generate thumbnails if needed. dynamic_thumbnails: false # The maximum number of simultaneous thumbnail generators to run. max_thumbnail_generators: 10 # A list of thumbnail sizes to be generated for media content. thumbnail_sizes: - width: 32 height: 32 method: crop - width: 96 height: 96 method: crop - width: 640 height: 480 method: scale # Configuration for the Room Server. room_server: internal_api: listen: http://localhost:7770 connect: http://localhost:7770 database: connection_string: postgresql://localhost/dendrite?sslmode=disable # connection_string: file:roomserver.db # max_open_conns: 10 # max_idle_conns: 2 # conn_max_lifetime: -1 # Configuration for the Signing Key Server (for server signing keys). signing_key_server: internal_api: listen: http://localhost:7780 connect: http://localhost:7780 database: connection_string: postgresql://localhost/dendrite?sslmode=disable # connection_string: file:signingkeyserver.db # max_open_conns: 10 # max_idle_conns: 2 # conn_max_lifetime: -1 # Perspective keyservers to use as a backup when direct key fetches fail. This may # be required to satisfy key requests for servers that are no longer online when # joining some rooms. key_perspectives: - server_name: matrix.org keys: - key_id: ed25519:auto public_key: Noi6WqcDj0QmPxCNQqgezwTlBKrfqehY1u2FyWP9uYw - key_id: ed25519:a_RXGa public_key: l8Hft5qXKn1vfHrg3p4+W8gELQVo8N13JkluMfmn2sQ # This option will control whether Dendrite will prefer to look up keys directly # or whether it should try perspective servers first, using direct fetches as a # last resort. prefer_direct_fetch: false # Configuration for the Sync API. sync_api: internal_api: listen: http://localhost:7773 connect: http://localhost:7773 external_api: listen: http://[::]:8073 database: connection_string: postgresql://localhost/dendrite?sslmode=disable # connection_string: file:syncapi.db # max_open_conns: 10 # max_idle_conns: 2 # conn_max_lifetime: -1 # This option controls which HTTP header to inspect to find the real remote IP # address of the client. This is likely required if Dendrite is running behind # a reverse proxy server. # real_ip_header: X-Real-IP # Configuration for the User API. user_api: internal_api: listen: http://localhost:7781 connect: http://localhost:7781 account_database: connection_string: postgresql://localhost/dendrite?sslmode=disable # connection_string: file:userapi_accounts.db # max_open_conns: 10 # max_idle_conns: 2 # conn_max_lifetime: -1 device_database: connection_string: postgresql://localhost/dendrite?sslmode=disable # connection_string: file:userapi_devices.db # max_open_conns: 10 # max_idle_conns: 2 # conn_max_lifetime: -1 # Configuration for Opentracing. # See https://github.com/matrix-org/dendrite/tree/master/docs/tracing for information on # how this works and how to set it up. tracing: enabled: false jaeger: serviceName: "" disabled: false rpc_metrics: false tags: [] sampler: null reporter: null headers: null baggage_restrictions: null throttler: null # Logging configuration, in addition to the standard logging that is sent to # stdout by Dendrite. logging: - type: file level: info params: path: ./log ```
kegsay commented 3 years ago

It seems like what I am experiencing isn't expected. "With the more recent dendrite versions with the proper shutdown stuff, that shouldn’t happen". And possibly the lockfile isn't being cleaned up.

This is expected. There is no lockfile for SQLite - it uses actual file locks which are controlled by the underlying OS, which has its lifetime tied to the process which opened the lock: no need for graceful shutdowns to "clean up". Dying in the middle of a transaction will automatically restore based on the contents of the .db file alone (and the WAL journal if WAL is enabled).

I can reproduce your issue by doing the following:

NewInternalAPI
     failed to connect to room server db           error="attempt to write a readonly database"
panic: (*logrus.Entry) (0x4f06000,0xc0003304d0)

The first time will result in .db files being created with root permissions, meaning you would then need to keep using sudo to run the server.

Please re-open if you can reproduce this in some other way which you believe is in error.

MadLittleMods commented 3 years ago

Thanks @Kegsay! I think your break down nailed it. I probably initially ran sudo to get around the log issue in https://github.com/matrix-org/dendrite/issues/1644

I deleted all of the existing *.db files (rm *.db) and switched my config back to using SQLite. Now starting up multiple times works as expected!

$ ./build.sh && ./bin/dendrite-monolith-server --tls-cert server.crt --tls-key server.key --config dendrite.yaml
INFO[2021-03-16T23:46:01.090646000Z] [github.com/matrix-org/dendrite/setup/base.go:110] NewBaseDendrite
     Dendrite version 0.3.11+3c419be6
INFO[2021-03-16T23:46:01.092996000Z] [github.com/matrix-org/dendrite/signingkeyserver/signingkeyserver.go:103] NewInternalAPI
     Enabled perspective key fetcher               num_public_keys=2 server_name=matrix.org
INFO[2021-03-16T23:46:01.105965000Z] [github.com/matrix-org/dendrite/setup/base.go:394] func2
     Starting external Monolith listener on :8008
INFO[2021-03-16T23:46:01.105958000Z] [github.com/matrix-org/dendrite/setup/base.go:394] func2
     Starting external Monolith listener on :8448

Next steps is to work on a solution for https://github.com/matrix-org/dendrite/issues/1644 so logs work out of the box and no one is tempted like me to use sudo here.