sophos / SOREL-20M

Sophos-ReversingLabs 20 million sample dataset
Apache License 2.0
637 stars 132 forks source link

About raw binary benign samples of SOREL-20M dataset #6

Closed vietvo89 closed 3 years ago

vietvo89 commented 3 years ago

Hello

Recently, I have found this dataset which is enormous and released with raw binary files. But then I found that it just keeps extracted features and metadata for an 10 million benign samples rather than raw binary files as malware samples. How can you train a model with mix format like that? I try to retrain some malware detection model based on different architectures like MalConv, lightGBM or based on RNN. Is it possible to obtain raw binary benign samples for academic research purpose?

Thanks

vietvo89 commented 3 years ago

I found the same question and you have answer it.