sophos / SOREL-20M

Sophos-ReversingLabs 20 million sample dataset
Apache License 2.0
637 stars 132 forks source link

Not all malware available #26

Open lkurlandski opened 1 year ago

lkurlandski commented 1 year ago

Not all of the binaries from the meta.db database are present in the bucket, even if the is_malware field is 1. For example,

H="997990bb784a9689e4293d788964c6a76ea7a1ff369a616f887e5fe288485e13"
aws s3 cp s3://sorel-20m/09-DEC-2020/binaries/$H . --no-sign-request

Works like a charm. However,

H="134e4de64556e84a1e30070bb210414c6399aea5a39dcea46182a9a986288dcb"
aws s3 cp s3://sorel-20m/09-DEC-2020/binaries/$H . --no-sign-request

causes

fatal error: An error occurred (404) when calling the HeadObject operation: Key "09-DEC-2020/binaries/134e4de64556e84a1e30070bb210414c6399aea5a39dcea46182a9a986288dcb" does not exist

Just wanted to let people know!