Recently, I have found this dataset which is enormous and released with raw binary files. But then I found that it just keeps extracted features and metadata for an 10 million benign samples rather than raw binary files as malware samples. How can you train a model with mix format like that? I try to retrain some malware detection model based on different architectures like MalConv, lightGBM or based on RNN. Is it possible to obtain raw binary benign samples for academic research purpose?
Hello
Recently, I have found this dataset which is enormous and released with raw binary files. But then I found that it just keeps extracted features and metadata for an 10 million benign samples rather than raw binary files as malware samples. How can you train a model with mix format like that? I try to retrain some malware detection model based on different architectures like MalConv, lightGBM or based on RNN. Is it possible to obtain raw binary benign samples for academic research purpose?
Thanks