wuyifan18 / DeepLog

Pytorch Implementation of DeepLog.
MIT License
372 stars 154 forks source link

Procedure for DeepLog #18

Open danielhanbitlee opened 5 years ago

danielhanbitlee commented 5 years ago

Hi,

I want to make sure I get the procedure to implement DeepLog correct. Here's what I'm thinking. Given train log data and test log data, do the following:

  1. Run spell on the train data and test log data to get log keys and features.
  2. Sort the outputs from spell into different sessions or blocks for both train and test data.
  3. Take only the log keys and put into file. Each row will represent one session or block of log outputs. Do this for both train and test data.
  4. Take sessions from train data that do not have errors and train it on deeplog.
  5. Run the test data to make predictions.

Can someone confirm whether this thinking is correct?

amineebenamor commented 5 years ago

Almost correct. You have to split the train and test data only after the step 4. Before the step 4, you have to do everything on all the dataset.

zhangch-fnst commented 5 years ago

@danielhanbitlee @amineebenamor Hi,about step2,can you tell me how to [Sort the outputs from spell into different sessions or blocks]. for example:we have the following data

EventId EventTemplate ParameterList
6af214fd Receiving block <> src <> <> dest <> 50010 ['blk_-1608999687919862906', '/10.250.19.102:54106', '/10.250.19.102']
26ae4ce0 BLOCK NameSystem.allocateBlock <> ['mnt/hadoop/mapred/system/job_2008110920300001/job.jar. blk-1608999687919862906']
6af214fd Receiving block <> src <> <> dest <> 50010 ['blk_-1608999687919862906', '/10.250.10.6:40524', '/10.250.10.6']
6af214fd Receiving block <> src <> <> dest <> 50010 ['blk_7503483334202473044', '/10.251.215.16:55695', '/10.251.215.16']
-- -- --
6af214fd Receiving block <> src <> <> dest <> 50010 ['blk_7503483334202473044', '/10.250.19.102:34232', '/10.250.19.102']
6af214fd Receiving block <> src <> <> dest <> 50010 ['blk_-1608999687919862906', '/10.250.14.224:42420', '/10.250.14.224']
dc2c74b7 PacketResponder <> for block <> terminating ['1', 'blk_-1608999687919862906']
dc2c74b7 PacketResponder <> for block <> terminating ['2', 'blk_-1608999687919862906']

After sorting by blocks,what will the data become?Thank you.