Closed bbockelm closed 9 years ago
Hi Brian,
It is supposed to queue the request when the thread limit is reached. If it isn't doing that then it should be fixed. Could you get me a gcore wih debug symbols when it gets into that state?
Andy
On Thu, 4 Sep 2014, Brian Bockelman wrote:
If a thread limit is hit, the cmsd stops functioning until the process is restarted. This has resulted in us having to continually increase the thread limits - the regional redirector cmsd in the US is currently at a limit of 200k threads.
It would be acceptable if the cmsd queues requests, just silently drops them, or returns an error message. However, the current behavior (functionality permanently stops) is a bit too painful.
Reply to this email directly or view it on GitHub: https://github.com/xrootd/xrootd/issues/137
######################################################################## Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
Hi Andy,
Unfortunately, I don't have anyone currently struggling with this (as it typically goes away after doubling the thread count).
I'll ask Marian to set a test instance back to the defaults and send the core forward next time the issue happens.
Brian
Hi,
I've set it back to defaults and turned back on our internal script email us if cmsd gets unresponsive.
Marian
Hi @zvada - any updates? We might need to ask Carl to run a scale test to cause the issue.
I saw in my inbox today that EOSCMS hit this issue - could perhaps @ljanyst check and see if Jan happened to have taken a core dump?
oh, did I have to do anything else here than turning config back to defaults? I didn't realize to get any core dump since then, it could be also due the fact I didn't run any tests. I'll talk to Carl to send tests then... sorry I didn't realize need of that before.
Carl ran open-read test at level of 8k concurrent jobs I don't see any significant high usage of threads:
[root@xrootd-itb ~]# grep -s '^Threads' /proc/pidof cmsd
/status | awk '{ print $2; }'
56
[root@xrootd-itb ~]# grep -s '^Threads' /proc/pidof xrootd
/status | awk '{ print $2; }'
592
Am I looking at right numbers or what number of jobs in scale test do we expect trigger abnormal behavior?
I still haven't received any kind of core file when this situation occurs. So, we can't really come up with a solution. I will try to reproduce this by running with special config with a very small thread limit.
Hi @abh3 ,
Is the plan to have this in 4.2.2?
Yes,
All of the patches in git head should be included in 4.4.2.
Andy
From: Edgar Fajardo Sent: Monday, July 13, 2015 8:43 AM To: xrootd/xrootd Cc: Andrew Hanushevsky Subject: Re: [xrootd] cmsd becomes inoperable after thread limit is hit (#137)
Hi @abh3 ,
Is the plan to have this in 4.2.2?
— Reply to this email directly or view it on GitHub.
Hi,
what xrootd version contains the bugfixes for this problem?
thanks, Gerard
Hi @gerardba ,
AFAIK this shoud all be included in the current version that is in the osg repos 4.2.3
If a thread limit is hit, the cmsd stops functioning until the process is restarted. This has resulted in us having to continually increase the thread limits - the regional redirector cmsd in the US is currently at a limit of 200k threads.
It would be acceptable if the cmsd queues requests, just silently drops them, or returns an error message. However, the current behavior (functionality permanently stops) is a bit too painful.