Open frank-y-liu opened 6 years ago
Add: master branch works fine
Yes. Please try master branch. Notice we have added support for cgroup and now you need privilege to launch master and agents. Please follow here to start the cluster.
Looks like there is no CPU configured to the agent and this is why you get "Resource request doesn't fit in cluster" error. This should be resolved in the master branch.
Thanks for getting back on this. Tried the master
branch in the upstream repo. Still have the same problem. Error message from dtc-master
:
I 14208 2018-05-30 14:13:06 master.cpp:200] Master @127.0.0.1 [agent:9909|graph:9910|webui:9912]
I 14208 2018-05-30 14:13:50 master.cpp:271] Agent 0 connected @127.0.0.1 [cpu:0|mem:135081242624|disk:17343295488]
I 14208 2018-05-30 14:14:24 master.cpp:300] Graph 0 connected @arlz009 [vertex:2|stream:2|container:2]
W 14208 2018-05-30 14:14:24 master.cpp:303] Graph 0 doesn't fit with available resources
I 14208 2018-05-30 14:14:24 master.cpp:142] Graph 0 is removed from the master
Any suggestions to turn on debug?
Added log message from dtc-agent
:
I 46976 2018-05-30 14:13:50 agent.cpp:135] Agent @127.0.0.1 [frontier:9913]
I 46976 2018-05-30 14:13:50 agent.cpp:138] cg-subsys.memory "/sys/fs/cgroup/memory/dtc" [limit:135081242624]
I 46976 2018-05-30 14:13:50 agent.cpp:139] cg-subsys.cpuset "/sys/fs/cgroup/cpuset/dtc" [cpus:0]
I 46976 2018-05-30 14:13:50 agent.cpp:140] cg-subsys.blkio "/sys/fs/cgroup/blkio/dtc" [weight:500]
Does this mean the dtc-agent
didn't get any cpu's allocated?
Could you please cat /sys/fs/cgroup/cpuset/dtc/cpuset.cpus
and let me know what u have?
I have fixed a minor bug in the cgroup
that might cause you to have this problem. Please update with the master branch and try it again. Let me know if the problem still exits.
Updated to ver. 0.2.2. Local mode works fine. But the encountered following error in single-host distributed mode. Error message in the master:
Error message in the submission window:
OS: Ubuntu 17.10