s-andrews / BamQC

Mapped QC analysis program
GNU General Public License v3.0
42 stars 8 forks source link

Program doesn't exit after crashing #19

Closed s-andrews closed 7 years ago

s-andrews commented 8 years ago

There seems to be a generic problem with the software where a crash which propagates to the top of the stack doesn't actually cause the parent process to exit. I presume this is because the parent is still waiting for some signal from the child which is processing, but the effect is that the program can hang forever, and on a cluster that means the job doesn't exit, and because the error reported is in a buffer you can't see what's happened.

We need some kind of a catch all handler for the child so the parent knows not to keep waiting.

pdp10 commented 8 years ago

Have you tried to uncomment the line :

# log4j.rootCategory=INFO, CONSOLE, LOGFILE

in log4j.properties in the main folder? That would store the output in bamqc.log which could tell us something.

pdp10 commented 8 years ago

Another thing we could do is to disable all or the suspected modules in Configuration/limits.txt to see whether it is due to the application or the modules. Are there many files to process or you already know which is the file causing this issue?

s-andrews commented 8 years ago

It's not the crash which I'm worried about - the one I had is actually fixed, it's the fact that the program doesn't exit. I can see the stack trace in the console, but the parent is still waiting. I think it's a problem with the queue monitoring in the offline runner.

Simon.

From: Piero Dalle Pezze notifications@github.com Reply-To: s-andrews/BamQC <reply+007c7ae3478dfcbe730c1d95f986532e90cd22b007419a2892cf0000000112be0be69 2a169ce07a94261@reply.github.com> Date: Monday, 25 January 2016 at 16:14 To: s-andrews/BamQC BamQC@noreply.github.com Cc: Simon Andrews simon@proteo.me.uk Subject: Re: [BamQC] Program doesn't exit after crashing (#19)

Another thing we could do is to disable all or the suspected modules in Configuration/limits.txt to see whether it is due to the application or the modules. Are there many files to process or you already know which is the file causing this issue?

‹ Reply to this email directly or view it on GitHub https://github.com/s-andrews/BamQC/issues/19#issuecomment-174558475 .

pdp10 commented 8 years ago

If that is the reason, I would say that a thread started by BamQC is not ending.

s-andrews commented 8 years ago

Possibly, but I'd have thought the Exception should have terminated it. It's more likely that the code which monitors the running threads is missing it? I don't know at this point - a few tests should get to the bottom of it.

Simon.

From: Piero Dalle Pezze notifications@github.com Reply-To: s-andrews/BamQC <reply+007c7ae34e7d833bd0a84b4440a6451da83159dfa443391f92cf0000000112be0d1c9 2a169ce07a94261@reply.github.com> Date: Monday, 25 January 2016 at 16:19 To: s-andrews/BamQC BamQC@noreply.github.com Cc: Simon Andrews simon@proteo.me.uk Subject: Re: [BamQC] Program doesn't exit after crashing (#19)

If that is the reason, I would say that a thread started by BamQC is not ending.

‹ Reply to this email directly or view it on GitHub https://github.com/s-andrews/BamQC/issues/19#issuecomment-174561571 .

pdp10 commented 8 years ago

Do you mean the exception in AnalysisQueue?
If so, that will be always caught. But it could be that the increment of usedSlots does not work as expected.

    @Override
    public void run() {

        while (true) {
//          log.debug("Status available="+availableSlots+" used="+usedSlots+" queue="+queue.size());
            if (availableSlots.intValue() > usedSlots.intValue() && queue.size() > 0) {
                usedSlots.incrementAndGet();
                AnalysisRunner currentRun = queue.removeFirst();
                currentRun.addAnalysisListener(this);
                Thread t = new Thread(currentRun);
                t.start();
            }

            try {
                Thread.sleep(500);
            } catch (InterruptedException e) {}
        }
    }

Also, were you running BamQC as single or multi threads?

pdp10 commented 8 years ago

The Thread/Runnable objects created by BamQC are in the classes:

All the remaining classes using a thread object only invoke the method sleep(), so those should not cause a problem. Therefore, if this issue is caused by a thread, the problem should be in one of the AnalysisRunner threads started by AnalysisQueue. If one of those AnalysisRunner objects throws an exception different from SequenceFormatException or IOException, the invoking methods in OfflineRunner (or BamQCApplication) should hang on for ever because the variable filesRemaining is not decremented (or in BamQCApplication, the loop iteration crashes without continuing processing the remaining files).

pdp10 commented 8 years ago

[EDIT: consider the pull request #22, and discard the pull request #21. They are the same but the above one wrongly includes Issue 17]

I added a generic exception as possible temporary solution. When you have a chance, could you put the version implementing this bug fix ( https://github.com/pdp10/BamQC/archive/bugfix_19.zip ) in the pipeline and let me know whether this works?

If that catch works, a stack trace for that exception is shown now. The same is applied for BamQCApplication when the AnalysisRunner is started (which was also potentially exposed to the same problem).

pdp10 commented 8 years ago

Okay, now the pull request #22 doesn't include issue 17 and it's the one to consider.

pdp10 commented 8 years ago

I merged the pull request #22 in devel and master branches.