logpai / logparser

A machine learning toolkit for log parsing [ICSE'19, DSN'16]
Other
1.55k stars 551 forks source link

Inconsistency with Templates and the EventTemplates in the Spark2k_corrected version file #116

Closed WahomeKezia closed 4 weeks ago

WahomeKezia commented 1 month ago

Hii there !

I wanted ask and clarify about eventTemplates with urls and file paths (using Spark logs)

E6| Connecting to driver: spark://<*>

E12 | Input split: hdfs://<*>

E25| Saved output of task 'attempt_<>' to hdfs://<>

E23 | Remoting started; listening on addresses :[akka.tcp://<*>]

I have noted on the corrected version, the logs in the structured csv have different templates from the eventTemplate csv file

eg.. here is a log , eventTemplate label and the template Input split: hdfs://10.10.34.11:9000/pjhe/logs/2kSOSP.log:21876+7292 | E12 | Input split: <*>

zhujiem commented 1 month ago

Sorry, I did not understand your problem.

WahomeKezia commented 1 month ago

Hii @zhujiem , Using eventTemplate 12 as an example ,

Input split: hdfs://10.10.34.11:9000/pjhe/logs/2kSOSP.log:21876+7292 | E12 | Input split: <*>

It's eventtemplate 12 ,Input split: <*> and on this file Spark_2k.log_templates.csv

the EventTemplate 12 is slightly different

I have noted the same with EventsTemplate 6,25 and 25 , Connecting to driver: spark://CoarseGrainedScheduler@10.10.34.11:48069 | E6| Connecting to driver: <*>

Is the correct template Input split: <*> or this one Input split: hdfs://<*> ?

zhujiem commented 1 month ago

In the loghub_2k_corrected, you should refer to _structured_corrected.csv and _templates_corrected.csv, which are the corrected versions. So, E12 should be:

E12 Input split: <*>
WahomeKezia commented 1 month ago

Ooh ,I see . Thank you! @zhujiem