spantaleev / matrix-docker-ansible-deploy

🐳 Matrix (An open network for secure, decentralized communication) server setup using Ansible and Docker
GNU Affero General Public License v3.0
4.73k stars 1.01k forks source link

rust-compress tool fails #2490

Open roughnecks opened 1 year ago

roughnecks commented 1 year ago

Describe the bug I ran this command: nice -n 19 /usr/local/bin/ansible-playbook -i inventory/hosts setup.yml -b --extra-vars='matrix_synapse_rust_synapse_compress_state_min_state_groups_required=75000' --tags=rust-synapse-compress-state

The rust-compress tool failed with msg: Timeout exceeded

Expected behavior I'd expect the playbook not to interfere and let the process end when it ends. It used to work until some months ago.

Matrix Server:

Additional context

fatal: [[matrix.woodpeckersnest.space](http://matrix.woodpeckersnest.space/)]: FAILED! => changed=false
  ansible_job_id: '790788893269.218548'
  changed_when_result: 'The conditional check ''matrix_synapse_rust_synapse_compress_state_compress_room_command_result.finished and matrix_synapse_rust_synapse_compress_state_compress_room_command_result.rc == 0'' failed. The error was: error while evaluating conditional (matrix_synapse_rust_synapse_compress_state_compress_room_command_result.finished and matrix_synapse_rust_synapse_compress_state_compress_room_command_result.rc == 0): ''dict object'' has no attribute ''rc''. ''dict object'' has no attribute ''rc'''
  child_pid: 218552
  finished: 1
  msg: Timeout exceeded
  results_file: /root/.ansible_async/790788893269.218548
  started: 1
  stderr: ''
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

After it failed in console, the process was still running in my system, I waited for another few minutes and it eventually stopped. Then I run rust-compress again and compared the rooms ids for which compression was needed and nothing had changed from the first and the second run. At this point I stopped it again without waiting for another 2 hours.

spantaleev commented 1 year ago

We've always had a timeout. It was shorter in the past, even.

It possibly worked a few months ago, because your rooms were smaller or your server was less loaded (or faster) and it managed to complete the compression in less than the default timeout. It seems like it can't complete within that timeframe anymore, so it timeouts.

We could probably remove the timeout altogether, or set it to some very large value (5 days?) to prevent such issues.

roughnecks commented 1 year ago

How long is the timeout now? It throwed the error just about after 2 hours..

roughnecks commented 1 year ago

There also were these messages printed on screen, which I cannot copy paste and which I didn't see the last time I ran the tool: https://i.imgur.com/8aJMFAm.jpeg

roughnecks commented 1 year ago

Does this also affect auto-compressor ?