About generate data and window size

DuoweiPan commented 3 years ago

@wuyifan18 Thank you for your implementation of DeepLog!

I have a question about the data processing part. When calling the generate function, is there any specific reason we need "line = tuple(map(lambda n: n - 1, map(int, line.strip().split())))" which is line 21 in LogKeyModel_train.py. I'm not sure why we need lambda n: n - 1, and it seems that map(int, line.strip().split()) doesn't change anything for the given train data.

Also is there any specific reason for a window size of 10? Because it seems like any session with length less than 10 will be padded with -1, correct me if I'm wrong, but isn't that make them automatically be set detected as abnormal? In that case, isn't different window size, like 10 and 3 will greatly affect the result?

Any hint will be helpful! Thank you!

wuyifan18 commented 3 years ago

Hi @DuoweiPan, The starting id needs to start from 0, so I need "lambda n: n - 1", and "map(int, line.strip().split())" is just for format conversion (char to int). As for the second question, the hyperparameter (such as window_size, hidden_size) is the same as the EVALUATION of the paper.

DuoweiPan commented 3 years ago

Hi @wuyifan18,

Got it! Thank you for such a quick response! Just a more general question, if I decide to train Deeplog from scratch, can I use other formats of log templates like 'a7e180bc' or a combination of text and number as train and test dataset? Or I have to transform them into just numbers?

Thank you!

wuyifan18 commented 3 years ago

Hi @DuoweiPan, I am not familiar with deep learning, but I think the input of LSTM should be a number sequence, so you should transform log templates into numbers.

DuoweiPan commented 3 years ago

I see. Thank you so much for helping me! I will close this issue.

wuyifan18 / DeepLog

About generate data and window size #44