hibiken / asynq

Simple, reliable, and efficient distributed task queue in Go
MIT License
10.04k stars 716 forks source link

[BUG] Errors trying to connect remote worker server to scheduler #867

Closed windowshopr closed 7 months ago

windowshopr commented 7 months ago

Describe the bug I'm getting errors when trying to connect a remote worker (on the same LAN, just a different computer) to my local scheduler. My environment is on Windows 10, Golang v 1.19. I am currently running a scheduler server, AND a worker server on the local machine, they are talking to each other fine and the program is running, on 127.0.0.1. When I attempt to run a second worker server on a second computer, connecting to the local machine at address 172.16.1.114, I get the following output:

asynq: pid=22087 2024/04/20 05:36:10.682225 INFO: Starting processing
asynq: pid=22087 2024/04/20 05:36:10.682428 INFO: Send signal TSTP to stop processing new tasks
asynq: pid=22087 2024/04/20 05:36:10.682464 INFO: Send signal TERM or INT to terminate the process
asynq: pid=22087 2024/04/20 05:36:15.688132 ERROR: Failed to write server state data: UNKNOWN: redis command error: SAD>asynq: pid=22087 2024/04/20 05:36:15.688764 ERROR: Dequeue error: UNKNOWN: redis eval error: dial tcp 172.16.1.114:6379>asynq: pid=22087 2024/04/20 05:36:15.688920 WARN: recoverer: could not list lease expired tasks: INTERNAL_ERROR: redis >asynq: pid=22087 2024/04/20 05:36:20.689726 ERROR: cannot subscribe to cancelation channel: UNKNOWN: redis pubsub recei>asynq: pid=22087 2024/04/20 05:36:20.689815 ERROR: Dequeue error: UNKNOWN: redis eval error: dial tcp 172.16.1.114:6379>asynq: pid=22087 2024/04/20 05:36:20.690497 ERROR: Failed to forward scheduled tasks: INTERNAL_ERROR: INTERNAL_ERROR: r>asynq: pid=22087 2024/04/20 05:36:20.690472 WARN: recoverer: could not reclaim stale aggregation sets in queue "default>^Casynq: pid=22087 2024/04/20 05:36:22.303588 INFO: Starting graceful shutdown
asynq: pid=22087 2024/04/20 05:36:23.688450 ERROR: Failed to delete expired completed tasks from queue "default": INTER>asynq: pid=22087 2024/04/20 05:36:25.692309 ERROR: Dequeue error: UNKNOWN: redis eval error: dial tcp 172.16.1.114:6379>asynq: pid=22087 2024/04/20 05:36:25.693000 INFO: Waiting for all workers to finish...
asynq: pid=22087 2024/04/20 05:36:25.693231 INFO: All workers have finished
asynq: pid=22087 2024/04/20 05:36:25.692776 ERROR: Failed to write server state data: UNKNOWN: redis command error: SAD>^C^C^C^C^Casynq: pid=22087 2024/04/20 05:36:35.699333 ERROR: cannot subscribe to cancelation channel: UNKNOWN: redis pu>asynq: pid=22087 2024/04/20 05:36:35.699347 ERROR: Failed to write server state data: UNKNOWN: redis command error: SAD>asynq: pid=22087 2024/04/20 05:36:36.690677 ERROR: Failed to delete expired completed tasks from queue "default": INTER>asynq: pid=22087 2024/04/20 05:36:41.691452 INFO: Exiting

It seems to start fine, but starts having issues a few seconds in. What might be causing this?

To Reproduce Steps to reproduce the behavior (Code snippets if applicable):

  1. On Windows 10, start the scheduler (use any basic scheduler code you desire), running Memurai as the Redis client for windows, default port 6379
  2. Start the worker on the same machine, ensure the tasks are being processed
  3. On a second/remote computer, run the same worker server code, changing only the IP address to the local IP of the scheduler machine.

Expected behavior Both local and the remote worker servers should be pulling tasks from the queue.

Screenshots None

Environment (please complete the following information):

Additional context None, but just ask and I can provide more details if needed.

I'm a newbie when it comes to Redis, so I'm wondering if maybe the Redis server only allows 1 max connection?

windowshopr commented 7 months ago

Simple fix, I just needed to change the bind address in the C:\Program Files\Memurai\memurai.conf file to 0.0.0.0 and voila. I guess leaving the bind address as 127.0.0.1 will only allow local connections to the Redis server.