Closed jhulkko closed 3 years ago
This bug is due to an integer overflow in the hash function. Should only affect 32-bits systems.
Proposed patch:
diff --git a/keyserver/internal/device_list_update.go b/keyserver/internal/device_list_update.go
index 4d1b1107..4f802293 100644
--- a/keyserver/internal/device_list_update.go
+++ b/keyserver/internal/device_list_update.go
@@ -245,7 +245,7 @@ func (u *DeviceListUpdater) notifyWorkers(userID string) {
}
hash := fnv.New32a()
_, _ = hash.Write([]byte(remoteServer))
- index := int(hash.Sum32()) % len(u.workerChans)
+ index := int(int64(hash.Sum32()) % int64(len(u.workerChans)))
ch := u.assignChannel(userID)
u.workerChans[index] <- remoteServer
I bumped into this bug while testing dendrite on a Raspberry Pi with a 32 bit system. As far as I could tell, after a fresh install the server would start up normally and it would work for communication among local users. As soon as a user attempted to chat with someone on a federated server, a record in the table mentioned with OP would appear and the server crash. It would not start again until the offending record had been deleted.
The issue is gone after applying the patch suggested in the previous post and rebuilding the server. It now works properly and allows communication with external users.
I rebuilt the server with Dendrite version 0.2.1 and the proposed patch. Now everything seems to work as expected on a 32 bit raspberry Pi system.
Background information
go version
: go1.15.2 linux/armDescription
Dendrite always crashes during startup due to some content on keyserver_stale_device_lists. Not all list entires seem to cause this, but when the server fails to start flushing the said list or deleting the table will always resolve the issue.
Steps to reproduce
No definite way to reproduce as the root cause is unknown.
Details
Log entries of the event:
Last log entry from PostgreSQL while trying to start:
Data returned was a list of 11 user handles. Nothing out of ordinary on them.
This time I renamed the table as backup_keyserver_stale_device_lists causing dendrite to re-create original table on next start. This resolved the issue.
I restarted the server process multiple times due to playing around with some config changes and the stale device list kept growing in between. This exact same issue with same errors hit again after some hours of playing around. This time I just removed rows from the table:
Afterwards the server started normally again.
If / when this happens again I will remove the rows one by one from the database to see if it is a specific user that triggers this issue.