honeynet / cuckooml

CuckooML: Machine Learning for Cuckoo Sandbox
https://honeynet.github.io/cuckooml/
146 stars 52 forks source link

Global "normalized" field does not correspond to the same field per VT vendor #8

Closed So-Cool closed 8 years ago

So-Cool commented 8 years ago

Global "normalized" field has to be updated with corresponding fields per VT vendor which has been updated to provide better labelling.

hgascon commented 8 years ago

What do you mean exactly here? Can you give an example?

So-Cool commented 8 years ago

The structure of the JSON at the moment has one normalized field per VT vendor: virustotal -> scans -> *vendor name* -> normalized; additionally there is a global normalized field here: virustotal -> normalized which pulls together all of the *vendor name* -> normalized fields. The latter one is not getting all the new normalized tokens that I've just implemented.

hgascon commented 8 years ago

I remember @jbremer and you discussing about the normalized field. Is the field virustotal -> normalized actually used in cuckoo? Or you are the one storing all normalized vendor names there? Couldn't you use this field to store the final label of the sample?

jbremer commented 8 years ago

Currently it is not used because there was too much noise in there. Hopefully with @So-Cool's changes it will be usable and we can indeed start using it for labelling.