spender-sandbox / cuckoo-modified

Modified edition of cuckoo
394 stars 178 forks source link

Modified with distributed. #187

Closed pashashocky closed 8 years ago

pashashocky commented 8 years ago

Hello Brad,

I have been using cuckoo modified for a while, and it does a lot of what I want, although I was tempted to try 2.0 due to them beginning work on distributed cuckoo among nodes.

I was able to write my own code to integrate several nodes that would report back to the leader and the reports would be added to the web interface. Unfortunately 2.0 is just not stable enough for me and has a lot of bugs, that cause crashes and prevent distributed nodes from reporting smoothly.

Now I am looking to implement distributed cuckoo into cuckoo-modified, wondering if you have any extra ideas, and whether you would be interested in a PR. Maybe you could give me an email so that we could talk there?

Kind Regards, Pash

doomedraven commented 8 years ago

@pashashocky I just finishing the same :D

jbremer commented 8 years ago

You realize that upstream also accepts pull requests, right? :-P

jbremer commented 8 years ago

Not to hijack the thread, but did you report any bugs / PRs?

doomedraven commented 8 years ago

@jbremer from my part it will be there but later, once it will be tested under hard load of 3 servers, btw i will drop you an email @pashashocky btw i think it could be good start sharing code and start discuss it

my code does the same as your in few words, but im returning the mongo compressed report and store it in master mongo, with saving task id originally and return it when you submit tasks, so dist db is just used as proxy, but is in few words description

pashashocky commented 8 years ago

@jbremer I actually have been getting like 3-5 different error types, from different places. I can come on irssi to talk about some of them to you, but it was like from the process.py and from scheduler.py and some other stuff. They would cause a part of the slave node to stop working, and that node would be KIA.

Regarding PR's, there are still a few things i would like to implement although maybe could do a PR for you. I slightly modified the db structure to have a link between the task id on the slave and its corresponding web id, additionally I pull most of the analysis data from the slave like ~300mb and using a modified processing module insert it into elastic/mongo for it to show up on the web ui...

jbremer commented 8 years ago

Well, process.py is known to be buggy, process2.py is already much better and cuckoo process from the upcoming Cuckoo package is even slightly better in the sense that it also supports non-PostgreSQL (https://github.com/cuckoosandbox/cuckoo/pull/863). Bugs from scheduler.py I'd be happy to hear about - aside from known issues with analysis tags it should work mostly fine.

Regarding the Distributed Cuckoo, that makes sense yeah. Only problem is finding the solution to where the data should be stored and pushed, from there on making the required adjustments is not too difficult - PRs welcome of course.

xdanx commented 8 years ago

@doomedraven Hi! I am working with @pashashocky. Is there any repository where you have your code so we can see it? Let's work together, maybe we can speed up the process and get distributed to cuckoo-distributed :)

doomedraven commented 8 years ago

@xdanx agree :), do you also guys in cuckoo IRC?, just to exchange emails nop is not published in any places, but i will push it to https://github.com/doomedraven/cuckoo-modified in a moment

xdanx commented 8 years ago

we're active on cuckoo IRC as well. Let's work on your fork then and hopefully we can generate a PR which will be accepted

doomedraven commented 8 years ago

published here, https://github.com/doomedraven/cuckoo-modified/commit/0cf1c049029f7f4f3ecd3ebca90e294c9adeeee3

i would speak with you guys to see if we can improve more things, for start is just test, and last step which i wanted to do is just return the id from current db not proxy, and when someone does req for report to dist api, and return correct, but part of getting mongo report and store it from slaves works fine, also from master

going to search you in irc

doomedraven commented 8 years ago

@xdanx i dont' see any user with your id in cuckoo channel, im use the same username there, can you ping me?

doomedraven commented 8 years ago

@spender-sandbox could you check this https://github.com/spender-sandbox/cuckoo-modified/compare/master...doomedraven:distributed?expand=1

and told us what do you think/suggestion/ideas? @pashashocky, @xdanx and I, we put a bit of love on that, and it works very good, we still working in last step, to retrieve all files(pcap, memdump)

In few words about how it works. 1) there must be master node, which will get back all files and data 2) when you push task, and task is goes to slave, it make reservation of task id in main_db, and once the analysis is finished on slave, and data retrieved(mongo report(must be executed in slave to avoid reprocessing and that 16mb limit), also behaviour, report.json and screenshots) that allow insert data to main db and mongo and draw report in master webgui, it also insert link to original analysis, but it will be removed once we solve the moving problem(we leave it in this way right now just to make easier for users retrieve pcap/memdump), but it will removed, that also will move binary and create symlink to make correct download from webgui/api 3) there also suport for htaccess for api.py we know what is "depricated" but is to avoid more load for django and leave for users /api/ with limits and use api.py more as admins with basic auth if required 4) upload task with tags works also :) 5) im sure i forgot some nice stuff but you can check it on that link, we tried make minimalistic changes to existing files, less dist.py

xdanx commented 8 years ago

Main discussion thread: https://github.com/spender-sandbox/cuckoo-modified/pull/229