justanhduc / task-spooler

A scheduler for GPU/CPU tasks
https://justanhduc.github.io/2021/02/03/Task-Spooler.html
GNU General Public License v2.0
273 stars 24 forks source link

Getting error of `Wrong server version.` #6

Closed iseong83 closed 3 years ago

iseong83 commented 3 years ago

Hi, I follow the installation instruction at Ubuntu18.04 and simply ran ts. But, I got the error message: Wrong server version. Received 1048576, expecting 730. Could you help me to resolve this issue?

justanhduc commented 3 years ago

Hi @iseong83. Which version do you install, CPU or GPU? Are you able to reproduce it anywhere (for eg Colab)? The error was caused by the miscommunication between server and client, but what caused it is beyond me at the moment.

RathnaMurthy commented 3 years ago

Hi, I was following the installation instruction on Ubuntu20.04 and simply ran ts. But, I get the same error message: Wrong server version. Received -1035402781, expecting 730. Could you also help me to resolve this issue?

iseong83 commented 3 years ago

Hi @justanhduc. Thank you for your response. I installed the GPU version and I am using CUDA 10.2. I tested on Colab, it works well. I can not reproduce this issue in Colab.

You said it's because of the miscommunication between sever and client, do you mean the task-spooler server and client Can the pre-installed task-spooler in Ubuntu cause this issue? image

I also installed yours after removing the pre-installed one, but it still does not solve the issue. But, I wonder, I need to do something else to restart the server after installing yours.

justanhduc commented 3 years ago

@iseong83 Indeed! The problem is that ts tried to communicate and receive some packages from the server, but it got from tsp instead. When you remove tsp without shutting down the server, the server is still there. To resolve this you should

  1. Reinstall tsp
  2. Run tsp -K to kill the server.
  3. Remove tsp

If you don't want to reinstall tsp, you can locate the server in the /tmp folder. It should have the format socket-ts.{uid}. Just remove it and you are done. Let me know if there's still any problem.

justanhduc commented 3 years ago

Hi @RathnaMurthy. Which branch did you use to install? Could you please check whether you have another version of tsp like above?

iseong83 commented 3 years ago

@justanhduc It works! I run tsp -K, but it seems there is still socket-ts.{uid} in /tmp. So I delete it also and it works now! Thank you so much for helping me to solve this issue!

RathnaMurthy commented 3 years ago

Hi @RathnaMurthy. Which branch did you use to install? Could you please check whether you have another version of tsp like above?

Hello, I used master branch to install. I tried your above steps and it works now !!!! Thank you so much :)

But I was just testing with multiple GPUs, and have some questions. When I try to run 2 jobs pointing to different GPUs, it still tells me it is 'allocating' for second job. The second job starts only when the first job is complete. But this doesn't solve the purpose.

DO you have any suggestions on how I can handle this ?

justanhduc commented 3 years ago

Hi @RathnaMurthy. I think you are still setting the number of slots to be 1. The number of slots is indicated in the top left of the queue. You can increase the number of slots by the -S flag.

RathnaMurthy commented 3 years ago

Hi @RathnaMurthy. I think you are still setting the number of slots to be 1. The number of slots is indicated in the top left of the queue. You can increase the number of slots by the -S flag.

Hi @justanhduc, Even though I increase the number of slots using -S flag, when I run 'ts -K' the number of slots is getting reset to 1. That was the problem. Thank you so much :)

justanhduc commented 3 years ago

Hi @RathnaMurthy. It's not a bug. That's what -K is supposed to do; it shuts down the server and resets everything. Anw, this is no longer related to this issue. I will close it now. If you think you experience a bug, feel free to create a new issue.