Open edvinhallvaxhiu opened 2 years ago
So the dataset contains features from both benign and malware samples (50% benign, 50% malware). The authors have also published the actual executables of all malware samples in the dataset. They are unable to provide the benign executables for various reasons (copyrights, etc) so the line from the README you are quoting refers to that.
Hello! In the README states that no benign samples are included in the dataset. While exploring the meta.db in s3://sorel-20m/09-DEC-2020/processed-data/meta.db, I noticed that the db contains a field "is_malware". For almost 50% of the dataset the value is set to 0. Could you provide some more information on how to read this field?
Thank you!