pfalcon / ScratchABit

Easily retargetable and hackable interactive disassembler with IDAPython-compatible plugin API
GNU General Public License v3.0
393 stars 47 forks source link

Added support to load binary data from file with offset and size to area #45

Open engelant opened 6 years ago

pfalcon commented 6 years ago

Thanks for the patch, but I'm not sure that ScratchABit should be a general-purpose binary cut-and-mix tool. There're enough tools for that already, starting with venerable dd.

engelant commented 6 years ago

too bad.

pfalcon commented 6 years ago

If "too bad" feel free to argue why it's too good to have such a feature, instead of relying on other tools. (But such issues, in which a maintainer is not sure, are usually left to "ripe" for some time.)

engelant commented 6 years ago

Well in my case I'm working with a dump of ESP8266 memmory, and I find using offsets to load part of the file to some addresses. E.g. my config looks like this rn:

#Segment 1: len 0x00574 load 0x40100000 file_offs 0x00000008
area .boot2 0x40100000(0x00574)   rx
load ./orig_raw.bin 0x40100000 0x00000008(0x00574)

#Segment 2: len 0x00308 load 0x3ffe8000 file_offs 0x00000584
area .dram0lib0 0x3ffe8000(0x00308)   rx
load ./orig_raw.bin 0x3ffe8000 0x00000584

#Segment 3: len 0x0021c load 0x3ffe8308 file_offs 0x00000894
area .dram0lib1 0x3ffe8308(0x0021c)   rx
load ./orig_raw.bin 0x3ffe8308 0x00000894

#SPI Flash map
area .flash0 0x40200000(0x100000)    rwx
load ./orig_raw.bin 0x40200000

[entrypoints]
ResetHandler = 0x40000080

This means I don't have 4 files in my folder but just one. Also dd does not take hex offset and length values, this makes it just a little more convinient. As I just took your number and range parsing methods, this works just the way as area, and it could be extended (if it isn't already) to also accept oct, dec, binary values. Furthermore it's like 16 lines of code, but then again only you can know how strict you adopt to the 'single purpose' philosophy. Also it doesn't break the previous behaviour, you still can use dd, but you can omitt it at the cost of 16 lines of code (and it's just startup code, not main loop or something).

Nable80 commented 6 years ago

This means I don't have 4 files in my folder but just one.

Oh, I like this feature. It's quite useful for inspection of firmware dumps, where code/data segments are mixed with resources, padding and other stuff that shouldn't be loaded into the disassembler. Specifying "partitions" in a single config with comments is much more neat than manual extraction of different segments from here and there.

pfalcon commented 6 years ago

@engelant, First of all, I appreciate first of all trying my cute tool, then taking a time to share your hack. That all goes as planned - it was written in Python specifically to allow people to make such hacks easily, like minutes from the start. But not every hack should end up in mainline...

load ./orig_raw.bin 0x40100000 0x00000008(0x00574)

Sorry, that's unreadable, no idea what 0x00000008(0x00574) means. Which means I'd need to read the docs. But nobody reads the docs, so I'll be just confused. Besides, there're no docs to read. This leaves only comments, which smart people also won't write.

But you wrote:

Segment 2: len 0x00308 load 0x3ffe8000 file_offs 0x00000584

And that sounds like a command to execute, you just need to have that command - find or write, your own, if dd doesn't work for you.

how strict you adopt to the 'single purpose' philosophy

Pretty strict, but here it would be different principle: does it solve a general problem? Nope, only niche, adhoc one. Please see the next answer.

But please don't close this - let it ripe. But if ever would be implemented, it would be something like:

load ./orig_raw.bin(skip=8, len=0x00574) 0x40100000

pfalcon commented 6 years ago

It's quite useful for inspection of firmware dumps, where code/data segments are mixed with resources, padding and other stuff that shouldn't be loaded into the disassembler.

And what if some segment needs to be loaded, but encoded/crypted/unpacked? Kaboom, all patching and reading docs which yet need to be written is for nothing - you're back to "manual" unpacking.

Specifying "partitions" in a single config with comments is much more neat than manual extraction of different segments from here and there.

Of course, manual extraction is better - because you can do whatever you want/need with that, use any tools you like, without being limited to something like

0x40100000 0x00000008(0x00574)

pfalcon commented 6 years ago

But gentlemen, if you like slicing around, yes you can, no patching required. I for one truly think that having stuff like:

00000000-0000ffff.bin
10000000-1000ffff.bin
20000000-2001ffff.bin
...

is better than having single blob. But I;m happy to cheat either: https://github.com/pfalcon/xtensa-subjects/blob/master/2.0.0-p20160809/proj_init.py#L76

Yes, that's not exactly like making the loader cherry-pick bytes for you, but as mentioned, it's a question of generality: let the Turing-completeness be with you. (Oh, and you can load from such a func too, ScratchABit is fully programmable, that's the whole point. [API needs refactoring, yeah.])

Nable80 commented 6 years ago

O-oh, It's even more fascinating. IMHO simplified version of such an example (or similar one) deserves its place in docs/ directory. Learning API without examples is somehow, erm, complicated. Of course, I may be wrong and polluting the repo with examples for API may be a bad idea. I'm not sure whether this API is well-known and documented enough to avoid adding a bit more sample scripts just here.

engelant commented 6 years ago

@pfalcon I'm happy you posted you tool online, so I could easily use it.

But nobody reads the docs, so I'll be just confused. Besides, there're no docs to read. This leaves only comments, which smart people also won't write.

Using your tool without documentation, well... I was glad there was this example def, as I had never disassembled anything before. The syntax might be changed, I just took what was there.

Maybe I didn't get the intended usage of ScratchABit, and sure, I will let this PR ripe.

engelant commented 6 years ago

Of course, manual extraction is better - because you can do whatever you want/need with that, use any tools you like, without being limited to something like 0x40100000 0x00000008(0x00574)

Well, you still can, I did't break anything. This is just supposed to be a little helper, not the exclusive usecase. I think it's a very basic operation, but that's just my oppinion.