xrootd / xrootd

The XRootD central repository https://my.cdash.org/index.php?project=XRootD
http://xrootd.org
Other
162 stars 151 forks source link

Xrootd Service restarted continously #1442

Closed cabrillo closed 3 years ago

cabrillo commented 3 years ago

Dear, This is and starge behaviour for our xrootd datatransfer systems (based on 8 identical CentOS 7 VMs): [root@pool02 ~]# xrootd -v v4.12.4 [root@pool02 ~]# rpm -qa| grep xrootd xrootd-client-libs-4.12.4-1.el7.x86_64 xrootd-4.12.4-1.el7.x86_64 xrootd-lcmaps-1.7.8-2.osg35.el7.x86_64 xrootd-server-4.12.4-1.el7.x86_64 xrootd-client-4.12.4-1.el7.x86_64 xrootd-server-libs-4.12.4-1.el7.x86_64 xrootd-cmstfc-1.5.2-6.osg35.el7.x86_64 xrootd-voms-4.12.4-1.el7.x86_64 xrootd-libs-4.12.4-1.el7.x86_64 xrootd-selinux-4.12.4-1.el7.noarch

Apr 12 20:06:41 pool02 systemd: xrootd@clustered.service failed. Apr 12 20:06:41 pool02 systemd: xrootd@clustered.service has no holdoff time, scheduling restart. Apr 12 20:06:41 pool02 systemd: Stopped XRootD xrootd daemon instance clustered. Apr 12 20:06:41 pool02 systemd: Started XRootD xrootd daemon instance clustered. Apr 12 20:07:21 pool02 systemd: xrootd@clustered.service: main process exited, code=killed, status=64/RTMIN+30 Apr 12 20:07:21 pool02 systemd: Unit xrootd@clustered.service entered failed state. Apr 12 20:07:21 pool02 systemd: xrootd@clustered.service failed. Apr 12 20:07:21 pool02 systemd: xrootd@clustered.service has no holdoff time, scheduling restart. Apr 12 20:07:21 pool02 systemd: Stopped XRootD xrootd daemon instance clustered. Apr 12 20:07:21 pool02 systemd: Started XRootD xrootd daemon instance clustered.

During these reboots transfers are jumpling from server to server until it is finished. Has anyone any notice about this behaviour? Regards, I

abh3 commented 3 years ago

This seems to be an OSG packaging issue where systemd does not have a restart hold time. As for reports of this generally happening, not so much. However, there is a report about issues with no hold time, see https://github.com/xrootd/xrootd/issues/1410

Of course, the server shouldn't be crashing at all. So, there are two problems here. What does the log say? Is there a core file? We need that information otherwise we can't tell you a thing.

As for no hold time, I suggest you report this ticket with OSG.

bbockelm commented 3 years ago

@cabrillo - does that go away is you disable asynchronous I/O?

The asynchronous I/O mechanism in XRootD is incompatible with the signal handling in xrootd-lcmaps.

(Note: xrootd-lcmaps, because it depends on Globus, is going to be deprecated over the next 12 months ... might be a good time to switch to the native VOMS plugin for that, especially once you upgrade to 5.x)

cabrillo commented 3 years ago

Thanks both of you! Brian, Is there any document/recipe to make/try this installation over Centos7 (voms plugin + xrootd 5.X...compatible with cms VO of course) Regards, I

cabrillo commented 3 years ago

Hi, I just try to test a fresh installation, testing the v5 and native VOMS plugin (following this : https://opensciencegrid.org/docs/data/xrootd/xrootd-authorization/) [root@pool09 ~]# rpm -qa| grep xrootd xrootd-cmstfc-1.5.2-6.osg35.el7.x86_64 xrootd-voms-5.1.1-1.el7.x86_64 xrootd-selinux-5.1.1-1.el7.noarch xrootd-client-libs-5.1.1-1.el7.x86_64 xrootd-5.1.1-1.el7.x86_64 xrootd4-libs-4.12.6-1.el7.x86_64 xrootd-server-libs-5.1.1-1.el7.x86_64 xrootd-client-5.1.1-1.el7.x86_64 xrootd-server-5.1.1-1.el7.x86_64 xrootd-libs-5.1.1-1.el7.x86_64

xrootd-lcmaps has been removed...but: -authzfun:libXrdLcmaps.so (this is provide by xrootd-lcmaps) =====> sec.protocol /usr/lib64 gsi -certdir:/etc/grid-security/certificates -cert:/etc/grid-security/xrd/xrdcert.pem -key:/etc/grid-security/xrd/xrdkey.pem -crl:3 -authzfun:libXrdLcmaps.so -authzfunparms:--osg,--lcmapscfg,/etc/xrootd/lcmaps.cfg,--loglevel,0|useglobals -gmapopt:10 -gmapto:0

the lcamp.cfg:

pepc = "lcmaps_c_pep.mod" "--pep-daemon-endpoint-url https://XXXXXXXXXX:8154/authz" " --resourceid http://YYYYYYYYY/xacml/resource/resource-type/xrootd" " --actionid http://glite.org/xacml/action/execute" " --capath /etc/grid-security/certificates/" " --no-check-certificates" " --certificate /etc/grid-security/xrd/xrdcert.pem" " --key /etc/grid-security/xrd/xrdkey.pem"

Which should be the correct value for -authzfun: in this case? Regards, I

cabrillo commented 3 years ago

Hi Brian, Andrew, I been trying to configure a v5 versión using the libsecvoms.so library using this directive at xrootd-cllustered.cfg

sec.protocol /usr/lib64 gsi -dlgpxy:1 -exppxy:creds -ca:1 -crl:3 -gridmap:/dev/null -cert:/etc/grid-security/xrd/xrdcert.pem -key:/etc/grid-security/xrd/xrdkey.pem -certdir:/etc/grid-security/certificates -vomsfun:/usr/lib64/libXrdVoms.so -vomsfunparms:grpopt=1|vos=ops,dteam,cms|certfmt=pem|grps=/cms,/ops,/dteam|dbg

vomsfun seems to be working: ................................... 10419 12:37:37 22442 XrdVomsFun: retrieval successful 210419 12:37:37 22442 XrdVomsFun: found VO: cms 210419 12:37:37 22442 XrdVomsFun: ---> group: '/cms', role: 'NULL', cap: 'NULL' 210419 12:37:37 22442 XrdVomsFun: ---> fqan: '/cms/Role=NULL/Capability=NULL' 210419 12:37:37 22442 XrootdXeq: cmspj007.310:43@wngrid025.prv.cloud pvt IPv4 login as 3a67d6a0.0 But with would be thebest way to make the local maping for read/write data...using lcmaps I could use the argus enpoint , papping DNs to local pool accounts, but now? Is the any way to still use argus? or it is required to use the gridmap-file (local mapping) Regards, I

bbockelm commented 3 years ago

Hi Iban,

Apologies - I didn't realize that you were using Argus here.

Indeed, the only way to invoke Argus from XRootD is through the LCMAPS integration. However, it's important to note that the support lifetime left is less than a year. You could certainly maintain the plugin yourself (looking through the commit history, most changes are simply version bumps to stay aligned with XRootD) if interested. Using a gridmap-file may be another option as that has a lot of support left.

It may be advantageous to start migrating away from pool accounts as well. I don't know what VOs you use them for but, for CMS, there was a good amount of work in ~2017 to make sure they weren't necessary to support the VO itself.

Brian

abh3 commented 3 years ago

I think Brian's comment sums up the problem as a compatibility issue.