Open chrisvdg opened 5 years ago
@chrisvdg can you tell me how many services are installed on this robot ? Or how much was there before it starts to give this error.
Also, did you get this error anytime you install the service, or is it random ?
After it happened once, it keeps happening, I'm guessing its' the reply of the zboot router.
I lost the output of the script so hard to tell, somewhere around 150-ish I'd guess
ok thanks ;-)
I did some test about the robot itself, and I can create more then 200 services without any problem. Now the service you install is zeroboot_ipmi_host, this service create an ssh connection during the install and I guess it keeps it open during the lifetime of the service. Could be we hit the limit of file descriptor of the system. What is the ulimit of the machine you run the 0-robot ?
@chrisvdg maybe try to raise it and see if that improves it
@zaibon, suggestions for the new setting?
Doesn't seem it wants to take a custom value
Wiped my zos VM but now I get this from the get go....
Other services installed just fine...
I'll fully reset my VM, use v1.5.0 zos image and try again...
On retrying https://gist.github.com/chrisvdg/0c821eb283b29ad0a9e80eb4f088d6a6
I got 49 services reporting to alerta (because of the network issue) so 49 ipmi_host services got successfully installed
Don't think it's the robot that's reaching it's filedescriptor's limit?
After inspection of the logs of the robot server, error comes from the ssh library used by the zeroboot_ipmi_service
.
Moving this issue to https://github.com/threefoldtech/jumpscale_core since the error comes from ssh client
@rkhamis can you find someone to have a look on this one please.
Found that the kubernetes pod running the zeroboot robot now to replace the VM has a much higher ulimit -n
( open files (-n) 1048576
) but it would still fail around the same amount of services, I'm assuming it's the file limit of the zeroboot router which is 1024
Let me see if I can increase it
do you pool your uci calls in 1 ssh session, or do you try to ssh 150 times at the same time?
I don't think it's pooled
either way, uci calls are locked, so it's certainly not in the wrt..., also this error tells it's a LOCAL RuntimeError with not enough fds
ahhnoh.. it's an eco
Trying to install a
zboot_ipmi_host
service on bancadati returns the followingThis started happening after I ran a script that sequentially added hosts services to the zboot robot, without concurrency, somewhere about midway though the hosts. Since then it happens for all the hosts