kieker-monitoring / kieker

Kieker's main repository
Apache License 2.0
70 stars 41 forks source link

[KIEKER-901] Execution traces from big log files #1090

Open rju opened 2 weeks ago

rju commented 2 weeks ago

JIRA Issue: KIEKER-901 Execution traces from big log files Original Reporter: Nils Christian Ehmke


As explained in KIEK-699:

There were problems analyzing log files (received within five minutes) and extracting the execution traces. Check whether this is a problem with the log files or with our own tool.

Checklist:

rju commented 2 weeks ago

author André van Hoorn -- Mon, 25 Feb 2013 18:53:22 +0100

We processed quite big monitoring logs with the tool already. But maybe the problems were caused by KIEKER-480 Done or KIEKER-671 Done . Is it possible to get the log files that caused the problems (student submissions)?

It would be good to have a "large" (whatever this means) log for regression purposes.

rju commented 2 weeks ago

author nils-christian -- Tue, 26 Feb 2013 10:38:52 +0100

It was G07 who had the problem. Maybe jwa can contact them and check whether they still have those files in question.

rju commented 2 weeks ago

author Jan Waller -- Tue, 26 Feb 2013 12:50:02 +0100

Attachment eclipse-project.zip has been added with description: G07 Afg5

rju commented 2 weeks ago

author Jan Waller -- Tue, 26 Feb 2013 12:50:45 +0100

The associated task (in German):
>In Serie 1 wurde das Kieker Framework vorgestellt. Führt für die Stacks in Aufgabe 4 mit Hilfe von Kieker Performancemessungen durch. Für jede Messung sollen 10 Threads gestartet werden, die zufällig Elemente auf den Stack legen oder von diesem entnehmen. Achtet auch auf eine ausreichende Warmup-Zeit, sowie eine ausreichende anfängliche Stack-Füllung. Die eigentliche Messung soll pro Stack 5 Minuten dauern und es sollen zur gleichen Zeit keine weiteren rechenintensiven Programme ausgeführt werden.
>a. Wie viele Elemente werden jeweils auf den Stack gelegt bzw. von diesem entnommen?
>b. Wie lange dauert durchschnittlich eine Ausführung einer Operation?
>c. Wie viele Versuche werden durchschnittlich und maximal benötigt, bis die Threads Elemente auf den Stack legen bzw. von diesem entfernen können?
>d. Interpretiert eure Ergebnisse!
>Gebt bitte auch die folgenden Angaben bzgl. der von euch verwendeten Rechnerkonguration an: Prozessortyp, Hauptspeicher und Betriebssystem.

I added the associated eclipse-project to produce the logs in attachment:eclipse-project.zip.

rju commented 2 weeks ago

author Jan Waller -- Fri, 1 Mar 2013 13:38:44 +0100

There seems to be a problem with log files containing OperationExecutionRecords.

Unrelated to the example posted above, I produced two Kieker logs, once using OperationExecutionRecords and once using BeforeOperationEvent and AfterOperationEvent records. In both cases the records are written using the BinaryWriter resulting in 1.2 GiB and 3.2GiB log files, respectively.

Analysis is performed with the TraceAnalysisTool with --plot-Assembly-Operation-Dependency-Graph response-time.

The first log file (1.2 GiB) fails with {{java.lang.OutOfMemoryError: GC overhead limit exceeded}}with 4GiB memory allocated to the JVM.
The second log file (3.2 GiB) works fine and never uses more than 0.5GiB memory.

rju commented 2 weeks ago

author André van Hoorn -- Fri, 1 Mar 2013 13:45:59 +0100

Replying to [jwa|comment:5]:
> Unrelated to the example posted above, I produced two Kieker logs, once using OperationExecutionRecords and once using BeforeOperationEvent and AfterOperationEvent records. In both cases the records are written using the BinaryWriter resulting in 1.2 GiB and 3.2GiB log files, respectively.

Could you provide the example (log files), please.

rju commented 2 weeks ago

author Jan Waller -- Fri, 1 Mar 2013 14:07:34 +0100

Of course

But they are a bit large to upload.

rju commented 2 weeks ago

author André van Hoorn -- Fri, 1 Mar 2013 14:21:09 +0100

Replying to [jwa|comment:7]:
> Of course
>
> But they are a bit large to upload.

Also with tar.gz, bzip2, etc.? Or maybe put it on the storage.

rju commented 2 weeks ago

author André van Hoorn -- Wed, 6 Mar 2013 14:57:37 +0100

Probably the trace timeout. Can be configured with

--max-trace-duration <duration in ms>                             Threshold
                                                                      (in
                                                                      millisecon
                                                                      ds) after
                                                                      which an
                                                                      incomplete
                                                                      trace
                                                                      becomes
                                                                      invalid.
                                                                      Defaults
                                                                      to 600000
                                                                      (10
                                                                      minutes).
rju commented 2 weeks ago

author André van Hoorn -- Thu, 4 Apr 2013 17:20:00 +0200

Could reproduce the problem and workaround this by selecting a (really small) trace timeout. Using {{OperationExecutionRecord}}s there's no alternative for playing around with the trace timeout in case of memory problems.

This seems to be an extreme monitoring log with really short traces. Concluding from the TraceAnalysis output, the two monitoring logs are not from a deterministic run. Is this right?

Won't fix for this ticket?

1. Event-based log without max. trace duration

 
avhblade2:0:~/git_work/kieker <master> $ bin/trace-analysis.sh -i ../../monitoring-logs/kieker-ticket-785/kieker-20130301-120755908-UTC-blade1-KIEKER-SPECjbb2013/ --plot-Assembly-Operation-Dependency-Graph response-time -o tmp/
...
#
# Plugin: Trace reconstruction (execution records -> execution traces)
Trace processing summary: 0 total; 0 succeeded; 0 failed.

#
# Plugin: Trace reconstruction (event records -> event record traces)
Trace processing summary: 26787738 total; 26787738 succeeded; 0 failed.

#
# Plugin: Trace reconstruction (event record traces -> execution traces)
Trace processing summary: 26787738 total; 26787738 succeeded; 0 failed.

See 'kieker.log' for details

real    24m40.395s
user    27m21.651s
sys     5m48.138s

2. Execution-based log with*out* max. trace duration

avhblade2:130:~/git_work/kieker <master> $ time bin/trace-analysis.sh -i ../../monitoring-logs/kieker-ticket-785/kieker-20130301-104850411-UTC-blade1-KIEKER-SPECjbb2013/ --plot-Assembly-Operation-Dependency-Graph response-time -o tmp/
...
Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space
        at java.lang.Class.getDeclaredFields0(Native Method)
        at java.lang.Class.privateGetDeclaredFields(Class.java:2291)
        at java.lang.Class.getDeclaredField(Class.java:1880)
        at kieker.common.record.AbstractMonitoringRecord.typesForClass(AbstractMonitoringRecord.java:255)
        at kieker.analysis.plugin.reader.filesystem.FSDirectoryReader.processBinaryInputFile(FSDirectoryReader.java:319)
        at kieker.analysis.plugin.reader.filesystem.FSDirectoryReader.run(FSDirectoryReader.java:126)
        at java.lang.Thread.run(Thread.java:662)

3. Execution-based log with max. trace duration of 60 milliseconds

avhblade2:130:~/git_work/kieker <master> $ bin/trace-analysis.sh -i ../../monitoring-logs/kieker-ticket-785/kieker-20130301-104850411-UTC-blade1-KIEKER-SPECjbb2013/ --plot-Assembly-Operation-Dependency-Graph response-time -o tmp/3/ --max-trace-duration 60
...
#
# Plugin: Trace reconstruction (execution records -> execution traces)
Trace processing summary: 23122211 total; 23122211 succeeded; 0 failed.
First timestamp: 1362134930422563616 (Fri, 1 Mar 2013 10:48:50 +0000 (UTC),Fri, 1 Mar 2013 11:48:50 +0100 (local time))
Last timestamp: 1362135111819184310 (Fri, 1 Mar 2013 10:51:51 +0000 (UTC),Fri, 1 Mar 2013 11:51:51 +0100 (local time))

#
# Plugin: Trace reconstruction (event records -> event record traces)
Trace processing summary: 0 total; 0 succeeded; 0 failed.

#
# Plugin: Trace reconstruction (event record traces -> execution traces)
Trace processing summary: 0 total; 0 succeeded; 0 failed.

See 'kieker.log' for details

real    6m45.553s
user    8m59.390s
sys     1m23.925s