tripleee / sloshy

Sloshy the Thawman, a simple chatbot to keep rooms alive on Stack Overflow / Stack Exchange
6 stars 11 forks source link

Nightly failed for manually frozen room #14

Closed tripleee closed 1 year ago

tripleee commented 2 years ago

The last couple of nights, the run has failed after several hours with a traceback ending with

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://chat.stackexchange.com/chats/111121/messages/new
92
Error: The operation was canceled.

See https://github.com/tripleee/sloshy/actions/runs/1129571072 and the one from the previous night.

tripleee commented 2 years ago

Turns out that the room where this was happening was forcibly frozen by a mod.

https://chat.stackexchange.com/transcript/message/58763847#58763847

tripleee commented 2 years ago

Left a ping in chat to alert the stakeholders: https://chat.stackexchange.com/transcript/message/58884002#58884002

tripleee commented 2 years ago

Still want to prevent this crash if it should happen again. It was fortunate that the problematic room was the last one in the YAML file, so it didn't prevent Sloshy from doing its job over the last few nights.

tripleee commented 2 years ago

The actual error is pretty deep inside the ChatExchange library. Fixing will probably require an upstream patch.

Just to keep this self-contained, here is the full traceback from Nightly #67.

Exception in thread message_sender:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/runner/.local/lib/python3.8/site-packages/chatexchange/client.py", line 183, in _worker
    self._do_action_despite_throttling(next_action)
  File "/home/runner/.local/lib/python3.8/site-packages/chatexchange/client.py", line 221, in _do_action_despite_throttling
    response = self._br.send_message(room_id, text)
  File "/home/runner/.local/lib/python3.8/site-packages/chatexchange/browser.py", line 299, in send_message
    return self.post_fkeyed(
  File "/home/runner/.local/lib/python3.8/site-packages/chatexchange/browser.py", line 133, in post_fkeyed
    return self.post(url, data, headers)
  File "/home/runner/.local/lib/python3.8/site-packages/chatexchange/browser.py", line 113, in post
    return self._request('post', url, data, headers, with_chat_root)
  File "/home/runner/.local/lib/python3.8/site-packages/chatexchange/browser.py", line 102, in _request
    response.raise_for_status()
  File "/usr/lib/python3/dist-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://chat.stackexchange.com/chats/111121/messages/new
Error: The operation was canceled.

The code in client.py traps HTTP error code 409 and retries forever, but it shouldn't be doing that for 404...?

ndattani commented 2 years ago

Hi @tripleee this was a Chemistry.SE room and a mod over there seems to have unilaterally frozen the room (MMSE and Chemistry.SE compete for the same audiences after all). We have a similar room on Physics.SE (https://chat.stackexchange.com/rooms/111122/questions-about-the-modeling-of-matter) which has not had such an unfortunate circumstance of getting frozen (in fact the Physics.SE mods have incorporated MMSE into the ticker feeds of their main chat room too). I can ask the Physics.SE mods in advance if we can keep that room unfrozen there.

Apart from the rooms mentioned above (one already frozen, and one for which I'll ask the mods if it can be kept unfrozen), I don't see any issue with rooms on the Sloshy list freezing anymore, because they're Matter Modeling rooms, and @TyBalduf has already approved having these rooms kept open (if a mod from a different site freezes them for some reason, Ty would likely just unfreeze it again, since it's an MMSE room and a mod from a different site is not an MMSE mod).

tripleee commented 2 years ago

Thanks for the background. I 'd appreciate if you could get explicit approval for the Physics room still, thanks. Can I open a separate ticket about that and assign it to you @ndattani?

tripleee commented 1 year ago

The actual acute error was in the scraper logic, it just manifested with a really deep chat traceback in production. Calling this fixed now, as the latest commit adds robustness for several failure cases.

tripleee commented 1 year ago

The robustness fixes still don't completely address the Chemistry room in particular; Sloshy is left trying to deliver the chat message for hours before finally giving up with a timeout. I'm thinking the simplest fix for that is to run with --test-rooms in the nightly job before the main job.

tripleee commented 1 year ago

(Previous comment implemented now; https://github.com/tripleee/sloshy/commit/0a68f24d867e67bb46df8b20f3dc427c23e23b1c)

tripleee commented 1 year ago

https://github.com/tripleee/sloshy/commit/6e884ae174830acbefec5a7715cc6402f447aeea adds an explicit check to raise an exception if a room's info page indicates that it is frozen or deleted.