What's the difference between the data in HDFS.npz and the data transformed in load_HDFS from the full HDFS.log_structed.csv

Hi, I tried using full HDFS log data to reproduce benchmarking results, I use logparser/Drain to get the full HDFS.log_structed.csv, which has the same structure with HDFS_100k.log_structed.csv. I load the full HDFS.log_structed.csv and label file in HDFS_benchmark.py, just like you did in demo, but the results of PCA and IM are very different from the results showed in readme.(LR,SVM,DT results are similar) It seems that the data in HDFS.npz are different from the data generated from the full HDFS.log_structed.csv using the load_HDFS function. Even if I get the HDFS.npz, it's still hard to use without knowing this difference. Many thanks

logpai / loglizer

What's the difference between the data in HDFS.npz and the data transformed in load_HDFS from the full HDFS.log_structed.csv #76