honeynet / cuckooml

CuckooML: Machine Learning for Cuckoo Sandbox
https://honeynet.github.io/cuckooml/
146 stars 52 forks source link

Reading in the data for analysis #3

Open So-Cool opened 8 years ago

So-Cool commented 8 years ago

The simplest solution is reading in the JSONs placed in the /storage directory. At later stages it might be worth developing something more natural.

hgascon commented 8 years ago

@jbremer, besides JSONs in /storage, what are the other options implemented in cuckoo to store the analysis results? @So-Cool you can start by reading the JSON files but abstract this importing from the analysis so that data can be queried from several sources.

jbremer commented 8 years ago

If we fully integrate this new module later on then you simply get access to the full dictionary which is also saved as reports/report.json (and which you'll be using for now). So that way you replace the json.load(...) with a couple boilerplate lines of code from Cuckoo. Integration is pretty easy :-)

hex1010 commented 8 years ago

Are we assuming we will run this module only on report.json like if we want to work on mem dumps for feature extraction ?

hgascon commented 8 years ago

@hex1010 What features would you extract from the memory dump?

hex1010 commented 8 years ago

DIff with baseline mem dump ? Does baseline feature support it at present ?

jbremer commented 8 years ago

Baseline feature only targets volatility output. Doing a complete (or partial) memory dump differential is out of scope here (but I don't think that's what you meant, right?)

hex1010 commented 8 years ago

I was thinking more on these lines http://www.dfrws.org/2012/proceedings/DFRWS2012-6.pdf but looks like its a stretch goal ..