sophos / SOREL-20M

Sophos-ReversingLabs 20 million sample dataset
Apache License 2.0
637 stars 132 forks source link

True label accuracy #22

Open roxas1533 opened 1 year ago

roxas1533 commented 1 year ago

This dataset is labeled

We use a combination of non-public, internal information as well as a number of static rules and analyses to obtain the ground truth labels.

but what is the actual accuracy? For example, the file "5e939818321bcd64cab2f711bb273c0d51479b08fb0f1371d39a6c88a294b02b" has a packed label of 0, but analysis shows that it is packed with UPX and can be unpacked. I think the correct packed label for this file should be 1.