xrootd / xrootd

The XRootD central repository https://my.cdash.org/index.php?project=XRootD
http://xrootd.org
Other
165 stars 151 forks source link

cmsd becomes inoperable after thread limit is hit #137

Closed bbockelm closed 9 years ago

bbockelm commented 10 years ago

If a thread limit is hit, the cmsd stops functioning until the process is restarted. This has resulted in us having to continually increase the thread limits - the regional redirector cmsd in the US is currently at a limit of 200k threads.

It would be acceptable if the cmsd queues requests, just silently drops them, or returns an error message. However, the current behavior (functionality permanently stops) is a bit too painful.

xrootd-dev commented 10 years ago

Hi Brian,

It is supposed to queue the request when the thread limit is reached. If it isn't doing that then it should be fixed. Could you get me a gcore wih debug symbols when it gets into that state?

Andy

On Thu, 4 Sep 2014, Brian Bockelman wrote:

If a thread limit is hit, the cmsd stops functioning until the process is restarted. This has resulted in us having to continually increase the thread limits - the regional redirector cmsd in the US is currently at a limit of 200k threads.

It would be acceptable if the cmsd queues requests, just silently drops them, or returns an error message. However, the current behavior (functionality permanently stops) is a bit too painful.


Reply to this email directly or view it on GitHub: https://github.com/xrootd/xrootd/issues/137

######################################################################## Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1

bbockelm commented 10 years ago

Hi Andy,

Unfortunately, I don't have anyone currently struggling with this (as it typically goes away after doubling the thread count).

I'll ask Marian to set a test instance back to the defaults and send the core forward next time the issue happens.

Brian

zvada commented 10 years ago

Hi,

I've set it back to defaults and turned back on our internal script email us if cmsd gets unresponsive.

Marian

bbockelm commented 10 years ago

Hi @zvada - any updates? We might need to ask Carl to run a scale test to cause the issue.

I saw in my inbox today that EOSCMS hit this issue - could perhaps @ljanyst check and see if Jan happened to have taken a core dump?

zvada commented 10 years ago

oh, did I have to do anything else here than turning config back to defaults? I didn't realize to get any core dump since then, it could be also due the fact I didn't run any tests. I'll talk to Carl to send tests then... sorry I didn't realize need of that before.

zvada commented 10 years ago

Carl ran open-read test at level of 8k concurrent jobs I don't see any significant high usage of threads: [root@xrootd-itb ~]# grep -s '^Threads' /proc/pidof cmsd/status | awk '{ print $2; }' 56 [root@xrootd-itb ~]# grep -s '^Threads' /proc/pidof xrootd/status | awk '{ print $2; }' 592 Am I looking at right numbers or what number of jobs in scale test do we expect trigger abnormal behavior?

abh3 commented 9 years ago

I still haven't received any kind of core file when this situation occurs. So, we can't really come up with a solution. I will try to reproduce this by running with special config with a very small thread limit.

efajardo commented 9 years ago

Hi @abh3 ,

Is the plan to have this in 4.2.2?

abh3 commented 9 years ago

Yes,

All of the patches in git head should be included in 4.4.2.

Andy

From: Edgar Fajardo Sent: Monday, July 13, 2015 8:43 AM To: xrootd/xrootd Cc: Andrew Hanushevsky Subject: Re: [xrootd] cmsd becomes inoperable after thread limit is hit (#137)

Hi @abh3 ,

Is the plan to have this in 4.2.2?

— Reply to this email directly or view it on GitHub.

gerardba commented 8 years ago

Hi,

what xrootd version contains the bugfixes for this problem?

thanks, Gerard

efajardo commented 8 years ago

Hi @gerardba ,

AFAIK this shoud all be included in the current version that is in the osg repos 4.2.3