spender-sandbox / cuckoo-modified

Modified edition of cuckoo
392 stars 178 forks source link

ES reporting module example guides? #85

Open mallorybobalice opened 8 years ago

mallorybobalice commented 8 years ago

Hi,

a question not an issue again, apologies again.

any good examples of configuring ElasticSearch reporting for CSB compared to mongo ? I'm just curious what the specific benefits are ... Am I missing out and there's something amazing about the ES syntax searches and possibly visualization via ELK(?)? Or is that people would use that stack already and find it convenient to have CSB reporting slot into it?

It sounds like the CSB Django Web UI will be still enabled (just uses ES instead of Mongo for fetching task info?). then people have an alt UI via for example Kibana for I suppose searching and visualization?

wondering if there's any configuration caveats, performance expectations, limitations for stored info ( I recall reading either in the reporting module, config or commits there might've been caveats?) , data visualization examples (which I presume go a UI e.g. kibana?) anyone willing to share their experience for running it as a reporting db for larger(?) cuckoo instances say - 20,000-500,000 tasks and monitoring it?

I sort of also wonder at what point people would use distributed cuckoo . it seems like that's more for separating your process.py per task instances into more manageable bits and it'd still aggregate results to a single instance? (implying mongodb and reporting aren't really an expected bottleneck for at least api usage? )

thanks . mb.

KillerInstinct commented 8 years ago

When I converted the ES module (based on Drainwares code) I set it up with default configs. Literally just installing it the same way that it says to in the installation documentation (ElasticSearch section of https://github.com/spender-sandbox/cuckoo-modified/blob/6e189628af3cddf58de1678b48a893cc1550dcdd/docs/book/src/installation/host/requirements.rst)

The reporting module takes care of creating datestamp-based indexes for cuckoo data, so indexing doesn't need to be changed unless you feel it's missing something to fit your needs. For example, some people may find it useful to index suricata data, or malfamily, depending on how large their dataset is and what they want to visualize.

Personally I did not setup a full ELK stack, but the setup should be relatively fine for Cuckoo as long as you are paying attention to disk IO. For example, it would be a bad idea to host the entire cuckoo+ELK stack on one cluster of disks. You'd be fighting for IO for VMs, processing (uploading temp files during submission/potentially malheur/etc), ELK data input + indexing as well as searching.

For my personal Cuckoo rig, I have a couple servers with multiple NICs, all with an NFS share mounted from the primary cuckoo server (completely internal network between the couple servers). This way I can have the VMs stored on one server so that VM IO is on its own disk cluster. MongoDB is on another server. The same could be done with ELK, actually you could modify our code to allow both MongoDB and ElasticSearch reporting modules if you wanted. In which case I could spin up a third server, enable the ElasticSearch module to submit the data to this third server, and use it for Kibana visualizations while still having Django be powered by MongoDB. Then the VMs have their own storage, ELK would have its own storage, and MongoDB would have its own storage. Of course this would duplicate ALOT of data, but it's just an example of the flexibility you can achieve with DIY setups. Besides, hard drives are cheap now a days. :)