spender-sandbox / cuckoo-modified

Modified edition of cuckoo
393 stars 178 forks source link

help extracting IoCs from tasks directly from mongo? #84

Open mallorybobalice opened 8 years ago

mallorybobalice commented 8 years ago

hello and heh.

this isn't really an issue (user issue :D) I was just hoping to ask a question (is there a more appropriate place to ?) about extracting info from the mongo reporting db...(I'm not quite sure what the division of labor between mongo and mysql is but i think all of reporting info goes to mongo )

I'd like a batch (aka python script) way of extracting IoCs for a particular query. say the query is for a wave of malspam... let's say a combination of some or all of the following: .doc & date > a & date < b & signatures matched contains abc & signatures (name or desc, or for example office martians >= 1 )

then extract specific IoCs from the task report. say ips + domains from the network info section . preferably into a per file + basic info csv (say fname + sha1 + ip + domain)

mallorybobalice commented 8 years ago

I suppose the plan probably is https://docs.mongodb.org/manual/reference/method/db.setProfilingLevel/ that or have a look at the web ui code.

On a cursory look a) https://github.com/spender-sandbox/cuckoo-modified/blob/master/web/api/views.py results_db.analysis.find around _ext_taskssearch

b) then for each listed task id something like _tasksiocs , looks pretty good but somewhat easier to ask if someone can share something similar they already use.

From a quick look the tasks_iocs api probably does what i want possibly via the api then extract from json as a lite version, but I'm unclear how to search for tasks using the constraints above and chaining queries (I don't think the web ui lets me , i'm unclear how to for the api, especially using regexes or date ranges, etc) for specific details and a basic python wrapper - would be much appreciated if someone can throw in a few quick examples.

mallorybobalice commented 8 years ago

hmmm, ok, mostly nevermind I suppose.

looks like webui is at the moment limited for task searches in general to one attribute per search. (any plans or am not seeing something?)

it seems like the api as per above does let you combine multiple search params (for example, not that the example necessarily makes sense)

curl -v -d "option=name&argument=.doc$&option=signame&argument=^.martian.$" http://xxxx:/api/tasks/extendedsearch/

then look up task info and extract network domain bits. curl -v http://xxx/api/tasks/get/iocs/YYY/

seems ok. but doesn't support malscore searches. Can always proxy malscore via signatures but curious why it doesn't support malscore searches.

I suppose, there's no plans to share wrapper scripts as an example folder or something of the sort? (or provide similar scripts for direct DB access instead of API? (to be honest, api seems sufficient) )

mallorybobalice commented 8 years ago

although, the api has no time window params...hmph. a comment from the maintainers on this would be really appreciated.

KillerInstinct commented 8 years ago

FWIW there is malscore searching in the default search since this commit: https://github.com/spender-sandbox/cuckoo-modified/commit/77cac96c211e416484cd8ba0125d19e98404d790

Would just need add similar code to the extendedsearch API node/API view. But yes, the getiocs API node was made specifically for that purpose.

I don't believe we, or upstream support date constraints at all -- but honestly I've lost track of upstreams searching as they converted to ElasticSearch completely for that. I don't imagine it would be hard to add, but I lack the time currently.

I can say that I used to use the getiocs api node, and instead of doing time constraint searches, we simply maintained a text file of "we have all the data up until task id XYZ" -> some time later we'd hit the API to see the most recent task reported, and then pull all IOCs for that data if we detected malfamily was >= 6 for file tasks or >=8 for URL tasks. It requires some additional processing, but it worked, as a workaround. This however does not solve the issue of simply searching to extract data/metrics for a specific rage of dates unfortunately.

mallorybobalice commented 8 years ago

ok thanks @KillerInstinct

yup figured we'd bind it by task ids . for the moment searching by signature signame or malfamily

as for api vs extended search, that's ok I understand. I was sort of hoping for consistency but it's not a critical issue by any means. I might try install ES or even upstream CSB RC1/dev in a test instance Will also try the fixed up syslog module + upstream infrastructure SIEM as an interim alternative, just didn't get the time to .

thanks again.