radareorg / ideas

4 stars 1 forks source link

Hooks for realtime cross-machine syncing #267

Open lunixbochs opened 7 years ago

lunixbochs commented 7 years ago

Real time collaboration platform

Radare2 has been a successful reverse engineering framework and a toolset for years. But apart from the decompilation the biggest missing feature - lack of the real time collaboration, which is important in case of reversing large files, playing CTFs in a teams. There are successful examples like collabREate, YaCo and solIDArity (proprietary/$$$). From public tools collabREate is the most complete and common, and it supports notifications (and online propagation) of those actions:

The exact list can be changed upon consultations. The resulting code and development process should be performed on GitHub platform.

Task

Required skills

Ability to code and understand C and Go. Difficulty: Medium

Evaluation

  1. Simple server in Go (with conflict resolution) is up and running + some tests of it
  2. Demo of small fake programs using C API to create new user, new project, connect to the server and sync, demo of resolving conflict

Links/Resources

The possible architecture of the platform:

WebUI [1] (microservice in Go) <-> Server [2] (microservice in Go) <-> C library of client [3].

The project management (create/remove/add user in project/remove user from project) should be done in the Server [2]. The user management - too. Those functions are exported in some way for using in WebUI microservice (RPC maybe?). The file storage is a filesystem, for storing initial files (in binary format), state and differences (in text format) using Git Go library, those text differences are "r2 commands", which are sent by C api library. Conflict resolution is done via standard Git features - rebase/merge.

XVilka commented 7 years ago

Could be related to iaito as well @hteso

lunixbochs commented 7 years ago

I'll break down the current IDA sync plugins and the events they support for reference.

Unreleased:

radare commented 7 years ago

i have some ideas that i would need to experiment to implement this, because it could be used to sync r2 instances too. i have no time to explain them or coding them now. but i’ll as soon as i get some spare time.

On 03 May 2017, at 20:12, Ryan Hileman notifications@github.com wrote:

I'll break down the current IDA sync plugins https://reverseengineering.stackexchange.com/questions/12054/is-there-an-actively-maintained-collaboration-plugin-for-ida and the events they support for reference.

collabREate (realtime, most complete example)

Change address/region type (code, data, unknown) Segment add, delete, move, change (like 32->64 bit, flags) Rename addr Function update (add, remove, bounds change) Comment updated Byte(s) patched Operand type changed (I assume hex, dec, str, offset, etc) Enum updates Struct type added, changed, or deleted Function tail? added or deleted Flirt function identified (would just be "function renamed") Xref add/delete (I don't know what this means) IDASync (realtime sync)

Update comment Rename addr Upload all function names Upload all breakpoints IDASynergy (svn-backed, not realtime)

Change addr/region type Rename addr Add/remove function BinCrowd (online, manual sync)

manual upload/download of function metadata name description hash edges stack frame image base address function offset cpu type programming language basic block count IDA Toolbag (offline-only sync)

you can share session (like db history/replay) files and merge them CrowdRE

Gone. I never used it but the biggest feature seems to be full-function annotations, fuzzy function hashing/matching, structures, and crowdsourcing Unreleased:

Sol[IDA]rity (in private beta, full features unknown)

Rename address Update struct type Change address type Highlights currently active function and address for each user "Jump invites" to invite a specific user to an address YaCo (will be announced at a conference in June)

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/radare/radare2/issues/7410#issuecomment-298991379, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lpnoGPz27g07htw1qCYGLGSCXjvWks5r2MOIgaJpZM4NPkrr.

radare commented 7 years ago

I have implemented cfg.log and cmd.log, as well as #!python -e to evaluate python code in line. So the current state of this magic allows you to track current offset and comments. Let me explain that so you can start playing with that "developer preview", so we can see what else can we include and how to enhance the interface for scripting all that stuff. I guess you can help on designing this:

The next things i have in mind are the following:

What's more anoying is to find the best place to hook all those events and avoid too much noise or remove temporary states. But ill go step by step and hopefully have it in a usable state for 1.5.

radare commented 7 years ago

https://github.com/radare/radare2-r2pipe/blob/6f412a34f569e66fa85862ddbc09bf029c92c37a/python/examples/sync/talkto.py

radare commented 7 years ago

You are not wat

On 11 May 2017, at 17:09, Sven Steinbauer notifications@github.com wrote:

I am not

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

XVilka commented 7 years ago

He is NOT. He can negate anything. He can make any positive result disappear. He can make things appear from nothing. He is NOT, he is Unary Ninja...

radare commented 7 years ago

@lunixbochs did you get a chance to test what i did?

lunixbochs commented 7 years ago

@radare thanks for this so far! I've been extremely busy with the Usercorn stream, but I'll take a stab at integrating very soon.

The sync codebase is fairly simple right now and lives here: https://github.com/lunixbochs/revsync It has fairly conservative sync (only renamed addrs and comments) and doesn't keep a consolidated database yet or use atomic operations, but it's a definite improvement over nothing, both easy to set up and integrate, and I have several people regularly using with no complaints.

radare commented 7 years ago

Its not yet complete, but its a start. Its hard for me to find some spare time to allocate for this. Let me know when you try it

On 21 May 2017, at 21:25, Ryan Hileman notifications@github.com wrote:

@radare thanks for this so far! I've been extremely busy with the Usercorn stream, but I'll take a stab at integrating very soon.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

XVilka commented 7 years ago

@lunixbochs Hi! Any update on this? Is anything need to be fixed/improved yet? Also now, as YaCo was released, it's clear, that it's not real-time collaborative RE at all - just a wrap around git...

XVilka commented 7 years ago

@lunixbochs ping?

XVilka commented 7 years ago

Probably it has the same destiny as Iaito.

radare commented 7 years ago

😱

On 18 Jul 2017, at 13:34, Anton Kochkov notifications@github.com wrote:

Probably it has the same destiny as Iaito.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

XVilka commented 7 years ago

@lunixbochs ping!

XVilka commented 6 years ago

For future implementers - seems there is a good algorithm for handling real-time collaboration - Operational Transformation:

https://medium.com/@raphlinus/towards-a-unified-theory-of-operational-transformation-and-crdt-70485876f72f

https://hal.inria.fr/inria-00071240

And here is an implementation example https://github.com/josephg/ShareJS

XVilka commented 6 years ago

My thought and experience of working in collaborative groups - those hooks should be native, not in Python, etc, especially if you want to handle more than 3 users connected at the same time and work on more than 10Mb file. Real time is expensive.

goulou commented 6 years ago

You should definitely have a look at YaCo : it works only with IDA at the moment, but the underlying model is fully independent from it. Implementing the hooks in both ways (for 2-way sync) in radare2 should not be so hard. Furthermore, all the hard work (synchronization, conflict merging...) is already done. It works with as many users as you want, both synchronously and asynchronously. It is available here and the plugin is compatible with IDA 7.0 and 7.1 : https://github.com/DGA-MI-SSI/YaCo

lunixbochs commented 6 years ago

We're still using revsync with ida and binary ninja, but nobody has shown enough interest on the team in radare to add a frontend. I imagine it would be rather easy for someone who spends more time with radare to build the integration.

Real time is expensive

Not as expensive as you'd think. Users can't really generate data very quickly, so as long as you're only synchronizing user actions, and have an efficient diffing layer, it doesn't matter what language you're using.

I'm not sending a 10mb binary around or diffing a 10mb binary. I just notice when someone makes a comment and broadcast "hey lunixbochs made a comment". CRDTs/OTs are way overkill here. There's not much by way of overlapping edits.

radare commented 6 years ago

I did, but i have no more time to spend in more things. In fact i build some basic apis to hook events and so on to do that, but yeah, from what i see everybidy interested on this have no spare time to implement it.

On 11 May 2018, at 23:41, Ryan Hileman notifications@github.com wrote:

We're still using revsync with ida and binary ninja, but nobody has shown enough interest on the team in radare to add a frontend. I imagine it would be rather easy for someone who spends more time with radare to build.

Real time is expensive

Not as expensive as you'd think. Users can't really generate data very quickly, so as long as you're only synchronizing user actions, and have an efficient diffing layer, it doesn't matter what language you're using.

I'm not sending a 10mb binary around or diffing a 10mb binary. I just notice when someone makes a comment and broadcast "hey lunixbochs made a comment". CRDTs/OTs are way overkill here. There's not much by way of overlapping edits.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

bannsec commented 6 years ago

Playing with this, i'm noticing some issues on the logging. For instance, the :CC that you implemented actually recorded the updated CC line, which would not accurately reflect the changes.

Example:

[0x00400470]> CC
0x00000000 CCu "[30] ---- section size 549 named .strtab"
0x00400238 CCu "[01] -r-- section size 28 named .interp"
0x00400254 CCu "[02] -r-- section size 32 named .note.ABI_tag"
0x00400274 CCu "[03] -r-- section size 36 named .note.gnu.build_id"
0x00400298 CCu "[04] -r-- section size 28 named .gnu.hash"
0x004002b8 CCu "[05] -r-- section size 120 named .dynsym"
0x00400330 CCu "[06] -r-- section size 66 named .dynstr"
0x00400372 CCu "[07] -r-- section size 10 named .gnu.version"
0x00400380 CCu "[08] -r-- section size 32 named .gnu.version_r"
0x004003a0 CCu "[09] -r-- section size 24 named .rela.dyn"
0x004003b8 CCu "[10] -r-- section size 72 named .rela.plt"
0x00400400 CCu "[11] -r-x section size 26 named .init"
0x00400420 CCu "[12] -r-x section size 64 named .plt"
0x00400460 CCu "[13] -r-x section size 8 named .plt.got"
0x00400470 CCu "[14] -r-x section size 386 named .text"
0x004005f4 CCu "[15] -r-x section size 9 named .fini"
0x00400600 CCu "[16] -r-- section size 10 named .rodata"
0x0040060c CCu "[17] -r-- section size 52 named .eh_frame_hdr"
0x00400640 CCu "[18] -r-- section size 244 named .eh_frame"
0x00600e10 CCu "[19] -rw- section size 8 named .init_array"
0x00600e18 CCu "[20] -rw- section size 8 named .fini_array"
0x00600e20 CCu "[21] -rw- section size 8 named .jcr"
0x00600e28 CCu "[22] -rw- section size 464 named .dynamic"
0x00600ff8 CCu "[23] -rw- section size 8 named .got"
0x00601000 CCu "[24] -rw- section size 48 named .got.plt"
0x00601030 CCu "[25] -rw- section size 16 named .data"
0x00601040 CCu "[26] -rw- section size 0 named .bss"
[0x00400470]> CC test
[0x00400470]> T
1 :CC [14] -r-x section size 386 named .text test @ 0x400470

Running that CC line would actually duplicate the comment. I would expect something like:

1 :CC test

As the actual text log. Also, I'm not seeing any logging for things like f, afn, afvn, etc.

bannsec commented 6 years ago

So, I'm implementing this as a core plugin. Looks like I can directly monitor for any call that i'm interested in anyway, so I probably don't need the T logging. I.e.: I can catch new comments being added by simply watching for CC and parsing it out as it's being called. This makes the push side pretty instantaneous. And I currently am multi-processing with the server->client updating so everything should work out.

radare commented 6 years ago

Not all the comments are done thru rcorecmd. I was thinking that maybe the sync in r2 should be implemented in frida. So, being able to hook any api without having to add proxy calls to everything in r2.

On 19 Jun 2018, at 03:13, bannsec notifications@github.com wrote:

So, I'm implementing this as a core plugin. Looks like I can directly monitor for any call that i'm interested in anyway, so I probably don't need the T logging. I.e.: I can catch new comments being added by simply watching for CC and parsing it out as it's being called. This makes the push side pretty instantaneous. And I currently am multi-processing with the server->client updating so everything should work out.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

XVilka commented 5 years ago

See also https://ckeditor.com/blog/Lessons-learned-from-creating-a-rich-text-editor-with-real-time-collaboration/