xrootd / xrootd-python

Python bindings for XRootD are now part of the main repository.
https://github.com/xrootd/xrootd
6 stars 8 forks source link

Avoiding the GIL #8

Closed rogerdowning closed 10 years ago

rogerdowning commented 10 years ago

Hi, STFC in the UK are moving to using XRoot to access their Castor system, replacing RFIO. I have a service which interfaces with Castor for deposition of experimental data into an archive. The service is driven by a multi-threaded TCPServer written in Python, so I was pleased to find your bindings. The service receives large numbers of small files and concatenates them into large files before sending them to Castor for storage on tape. On retrieval from tape, we perform a stager_get and then we run multiple retrieval jobs copying the large files back to disk, and we serve files to the client from there. We do this to attempt to avoid long delays waiting for files to be staged from tape, and it works well for us. I modified the backend to use these bindings for copying the large files to Castor (previously it was shelling out to rfcp), but I found that when I called FileSystem.copy() on a large file it would hang the whole process until the copy finished. I assume this is because the copy() is implemented in C and therefore is not subject to the timeslicing done by the Python interpreter (2.6.6)? I'm aware of the CopyProcess() functionality you provide, but would that also pause until all the jobs are complete? If I were to perform a FileSystem.copy() asynchronously with a callback, would that allow other threads of execution to carry on in the meantime? I could use a loop and File.write() but since we're not concerned with writing or reading portions of the files it would seem preferable to just deal with a put/get style of operation. I have sorted this for the time being by just shelling out to xrdcp for the copy to Castor, but I would really like to use these bindings. Is there any strategy I can adopt that would circumvent the locking I think I see?

Thanks in advance,

Roger Downing

jlsalmon commented 10 years ago

Hi Roger,

I don't believe that the new XRootD client (upon which pyxrootd is based) currently supports asynchronous copy jobs (@ljanyst please correct me if I'm wrong?) hence why pyxrootd doesn't support it either and will block with both FileSystem.copy() and CopyProcess.

I know threading in Python is a bit of a nightmare, but (tentative suggestion) you could try FileSystem.copy() in a separate "thread"?

Cheers, Justin

rogerdowning commented 10 years ago

Hi there, Thanks for responding! Unfortunately, the FileSystem.copy() already runs in its own thread, but locks the process because the Python 2.x interpreter won't interrupt it :-( For now, I'm OK with shelling out to xrdcp for the parallel copy. I hope in the future to support direct streaming of data to and from Castor because we're seeing high contention on the RAID array where the data lands from Castor ( 800 MB/s inbound tends to kill outbound performance!), and this will involve moving to File.write() ops which should work better because the write loops can be interleaved.

Cheers,

Roger Downing

STFC Daresbury Laboratory, Keckwick Lane, Warrington WA4 4AD UK

tel: +44 1925 603937


From: Justin Lewis Salmon [notifications@github.com] Sent: 03 February 2014 22:07 To: xrootd/xrootd-python Cc: Downing, Roger (STFC,DL,SC) Subject: Re: [xrootd-python] Avoiding the GIL (#8)

Hi Roger,

I don't believe that the new XRootD client (upon which pyxrootd is based) currently supports asynchronous copy jobs (@ljanysthttps://github.com/ljanyst please correct me if I'm wrong?) hence why pyxrootd doesn't support it either and will block with both FileSystem.copy() and CopyProcess.

I know threading in Python is a bit of a nightmare, but (tentative suggestion) you could try FileSystem.copy() in a separate "thread"?

Cheers, Justin

Reply to this email directly or view it on GitHubhttps://github.com/xrootd/xrootd-python/issues/8#issuecomment-34006176.

Scanned by iCritical.

bbockelm commented 10 years ago

@jussy - I think you want to look at this:

http://docs.python.org/2/c-api/init.html#threads

You want to wrap the copy job invocation with Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS.

Otherwise, FileSystem.copy() holds the global interpreter lock and no other python threads can run.

ljanyst commented 10 years ago

@jussy @bbockelm is right, do you have time fix it or should I?

jlsalmon commented 10 years ago

Ok, I misunderstood the problem. I am using those macros for the async stuff in File/FileSystem but didn't think to use them here. @bbockelm thanks for the correct suggestion.

@ljanyst I will do it, I just about have time :)

ljanyst commented 10 years ago

Great, thanks!

rogerdowning commented 10 years ago

This is brilliant, thanks so much guys!

Roger Downing

STFC Daresbury Laboratory, Keckwick Lane, Warrington WA4 4AD UK

tel: +44 1925 603937


From: Lukasz Janyst [notifications@github.com] Sent: 04 February 2014 08:41 To: xrootd/xrootd-python Cc: Downing, Roger (STFC,DL,SC) Subject: Re: [xrootd-python] Avoiding the GIL (#8)

Great, thanks!

Reply to this email directly or view it on GitHubhttps://github.com/xrootd/xrootd-python/issues/8#issuecomment-34039951.

Scanned by iCritical.

jlsalmon commented 10 years ago

@rogerdowning This is now fixed in HEAD. Thanks for reporting!

You can get the RPMs from TeamCity at: https://teamcity-dss.cern.ch:8443/viewType.html?buildTypeId=bt80

Cheers, Justin