leejo / cgi-fast

The new home for CGI::Fast, removing it from the original CGI.pm distribution
4 stars 5 forks source link

Perl Threads Getting Hung in version : 5.12 #6

Closed ComputerAssociates closed 9 years ago

ComputerAssociates commented 10 years ago

This is a CA Technologies Application issue.

Operating System: Windows 2012 Server

Perl version: 5.12

We have taken forced dumps on one of our hung process in our Application. It seems that, threads are getting hung in a critical section of the Perl code.

In a single forced dump we could see that there were 40 threads in this state and with this call stack. The following is the call stack that we observed during the hang:

            ntdll.dll!_RtlpEnterCriticalSectionContended@4()  + 0xa7 bytes 
           ntdll.dll!_RtlEnterCriticalSection@4()  - 0x24ffd bytes      
           msvcr100.dll!___lock_fhandle()  + 0x57 bytes    
           msvcr100.dll!__write()  + 0x56 bytes       
           perl512.dll!PerlLIOWrite(IPerlLIO * piPerl=0x0575d01c, int handle=2, const void * buffer=0x13e2e1cc, unsigned int count=50)  Line 1097 + 0x11 bytes  C++
           perl512.dll!PerlIOUnix_write(interpreter * my_perl=0x06ef8c74, _PerlIO * * f=0x0781fc8c, const void * vbuf=0x13e2e1cc, unsigned int count=50)  Line 2699 + 0x18 bytes             C
           perl512.dll!Perl_PerlIO_write(interpreter * my_perl=0x06ef8c74, _PerlIO * * f=0x0781fc8c, const void * vbuf=0x13e2e1cc, unsigned int count=50)  Line 1643 + 0x2b bytes            C
           perl512.dll!PerlIOBuf_flush(interpreter * my_perl=0x06ef8c74, _PerlIO * * f=0x13e2e1cc)  Line 3780 + 0x13 bytes     C
           perl512.dll!PerlIOCrlf_flush(interpreter * my_perl=0x06ef8c74, _PerlIO * * f=0x13d42a58)  Line 4618 + 0xa bytes     C
           perl512.dll!Perl_PerlIO_flush(interpreter * my_perl=0x06ef8c74, _PerlIO * * f=0x13d42a58)  Line 1667 + 0x6 bytes     C
           perl512.dll!PerlIOCrlf_write(interpreter * my_perl=0x06ef8c74, _PerlIO * * f=0x05dbf45d, const void * vbuf=0x05dbf42c, unsigned int count=49)  Line 4605 + 0x9 bytes               C
           perl512.dll!Perl_PerlIO_write(interpreter * my_perl=0x06ef8c74, _PerlIO * * f=0x13d42a58, const void * vbuf=0x05dbf42c, unsigned int count=49)  Line 1643 + 0x2b bytes             C
           perl512.dll!Perl_write_to_stderr(interpreter * my_perl=0x00000031, sv * msv=0x140d53c4)  Line 1275  C
           perl512.dll!Perl_vwarn(interpreter * my_perl=0x06ef8c74, const char * pat=0x70978890, char * * args=0x1640f9e0)  Line 1464 + 0x7 bytes                C
           perl512.dll!Perl_vwarner(interpreter * my_perl=0x06ef8c74, unsigned long err=2, const char * pat=0x70978890, char * * args=0x1640f9e0)  Line 1563 + 0xc bytes            C
           perl512.dll!Perl_ck_warner_d(interpreter * my_perl=0x06ef8c74, unsigned long err=2, const char * pat=0x70978890, ...)  Line 1520 + 0x12 bytes        C
           perl512.dll!Perl_ck_defined(interpreter * my_perl=0x06ef8c74, op * o=0x1407d120)  Line 7307 C
           perl512.dll!Perl_newUNOP(interpreter * my_perl=0x06ef8c74, long type=42, long flags=0, op * first=0x1407d13c)  Line 3110 + 0x3c bytes              C
           perl512.dll!Perl_yyparse(interpreter * my_perl=0x13e48e24)  Line 1198                C
           perl512.dll!S_parse_body(interpreter * my_perl=0x06ef8c74, char * * env=0x00dfe818, void (interpreter *)* xsinit=0x00f57ce0)  Line 2160 + 0x8 bytes      C
          perl512.dll!perl_parse(interpreter * my_perl=0x06ef8c74, void (interpreter *)* xsinit=0x00f57ce0, int argc=3, char * * argv=0x1640fb88, char * * env=0x00dfe818)  Line 1606  C
           spel_srvr.exe!parse_perl()  Line 398 + 0x1b bytes            C++

Please note that the spel_srvr.exe is our calling process.

If the dumps are required for debugging, please let us know, you may reach me at kooka01@ca.com, bidut01@ca.com for any queries on this.

This is an urgent issue and any quick help or response will be appreciated.

Kalyan

leejo commented 10 years ago

Please can you provide dumps. This is going to be difficult to debug due to the combination of Windows + 5.12 + threads.

ComputerAssociates commented 10 years ago

Hi Leejo,

Thanks for the swift response.

Here is the link around 68 MB archive.

https://drive.google.com/file/d/0B5dW-0c4IY2WMmEtVXFlTlNsSnM/edit?usp=sharing

There are 2 spel_srvr.exe dumps (includes spelsrvr.exe, perl512.dll) and all the associated pdbs.

Can you please review them?

Also let us know that if you have seen similar stack trace earlier and any associated issues that you might be aware of that has this stack trace?

Thanks again for taking time on this.

Kalyan

leejo commented 10 years ago

I will try to look at this over the next couple of weeks, however i can't make any promises as i will be busy and also Windows/threads is not an area i am familiar with. It's also been several years since i played around with coredumps. One last thing that would aid in the debugging of this issue is the raw perl source code calling CGI/CGI::Fast as the compiled .exe doesn't give me much to go on.

leejo commented 10 years ago

Bump - any chance of seeing the actual perl source showing this issue? It's going to be very hard for me to debug without it.

ComputerAssociates commented 10 years ago

Thanks Leejo,

we will try to get you the source code causing the perl_parse call. Meanwhile we are trying to upgrade to perl 5.20 to see if it solves this problem.

leejo commented 10 years ago

OK. FYI 5.12 to 5.20 is a big jump and there have been several changes / deprecations that may affect you. I would advise checking out the perldelta for each version you are leapfrogging.

leejo commented 10 years ago

Bump! Any progress on this or getting a copy of the source code that shows the problem?

ComputerAssociates commented 10 years ago

Hi Leejo,

Sorry for the delay. Basically the call stack before that is leading to perl_parse is this , interesting thing to note is another of our customer faced the same problem and he was also on WINDOWS 2012:

spel_srvr.exe!parse_perl()  Line 398 + 0x1b bytes
spel_srvr.exe!Perl_Thread::run()  Line 330 + 0x5 bytes  C++
spel_srvr.exe!NX::Thread::thread_proc(NX::Thread * th=0x03f61bd0)  Line 105      C++
spel_srvr.exe!_callthreadstartex()  Line 314 + 0xf bytes C
spel_srvr.exe!_threadstartex(void * ptd=0x14cd91a0)  Line 297                C
kernel32.dll!@BaseThreadInitThunk@12()  + 0xe bytes
ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes
ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes

The function calling parse_perl is in run method ,note spel_srvr.exe is our process, entire source code leading to the call stack is too BIG, so I am just copying here the function that is calling Perl_parse, let me know if it is not sufficient and if not give me the directions on which code to paste exactly.

void Perl_Thread::run()
{ 
    int restart_count = 0;
    PerlInterpreter* interp = parse_perl();

    while ( _state != TERMINATE ) {
        if ( restart_count >= restart_threshhold() ) {
            dealloc_perl( interp );
            interp = parse_perl();
            restart_count = 0;
        }
        if ( !interp ) {
            finished(-1, "Error parsing Perl code");
            interp = parse_perl();
        }
        else {
            if (ACTIVE == _state)
            {
                run_perl( interp, (const char *)_method, _request_msg);
                finished();
                restart_count++;
            }
        }
    }
leejo commented 10 years ago

OK, well i'm a bit confused. You have a separate C++ process that is calling (exec'ing?) another perl process? Or are you using xs? Or are you hooking into something else?

ComputerAssociates commented 10 years ago

yes C++ process invoking perl call. function that is invoking parse_perl is in c++, let me know if you need anything

leejo commented 10 years ago

So the perl is exec'd as a separate process independent from the C++ code, or are you hooked in some how? If the perl code is being exec'd then you need to isolate this down to a test case. What exactly is parse_perl()?

ComputerAssociates commented 10 years ago

will get back to you on this one, in my opinion we have hooked it, meaning we have perl interpreter code in a .C file and from there we are calling the perl_parse from the parse_perl function , here is the some of the snippet of the parse_perl , may be you can get an idea by seeing the parse_perl code:

PerlInterpreter* parse_perl()     // static
{
    PerlInterpreter * my_perl;

    if ( !get_perlcode() ) {
       Perl_Thread::logmsg( "perlinterpretor.c", 320L, LOGGER_NX::ERROR, "perl code not defined" );
        return NULL;
    }

    my_perl = perl_alloc(); 
    if (!my_perl) {
       Perl_Thread::logmsg( "perlinterpretor.c", 326L, LOGGER_NX::ERROR, "perl_alloc failed" );
        return NULL;
    }

    perl_construct(my_perl);
    char *dummy_argv[] = {"","-e","0"}; 
    dummy_argv[2] = get_pdm_perlcode();
    perl_parse(my_perl, xs_init, 3, dummy_argv, pdm_environ ); 

............
leejo commented 10 years ago

Oh my... Well if you're hooked in then all bets are off because this could be any number of things :) I would try as hard as possible to isolate this down to a test case, because there is no way i am going to be able to replicate your setup here.

ComputerAssociates commented 10 years ago

hmm..If you need anything let me know from my side :) .But I can confirm much on that section of code when I get in touch with my peers. Thanks for taking a look, also we have 2 customers running into same problem and surprisingly both are using WIN 2012 and also the fix that we have upgrading perl to 5.20 does not seem to work either.....

ComputerAssociates commented 10 years ago

Leejo,

also we have one configurable option MAX_PERL_THREADS This variable defines the maximum number of Perl Threads that can be ACTIVE at one time before the spel_srvr queues request until an ACTIVE Perl thread becomes IDLE. The default is 50. Hope that helps somehow may be :|

leejo commented 10 years ago

I have spoken to someone who is more familiar with the low level perl internals and here is some possibly useful information for how you are doing this (and i would say that the way you're doing this, if you are doing it the way i think you are doing this, is just asking for trouble):

leejo commented 10 years ago

Bump! Any progress on this? I am considering closing the issue as a "can't replicate" as i genuinely think it's going to be impossible for me to replicate given the recent discussion.

leejo commented 9 years ago

Closing for aforementioned reasons. Please comment if you have any luck in solving this.