landley / toybox

toybox
http://landley.net/toybox
BSD Zero Clause License
2.39k stars 335 forks source link

Hexedit ascii mode #278

Open aheirman opened 3 years ago

aheirman commented 3 years ago

hexedit - view and edit files in hexadecimal or in ASCII src https://www.systutorials.com/docs/linux/man/1-hexedit/

It appears that this functionality, editing in ascii mode, is not present in this version.

landley commented 3 years ago

The first big program I wrote (when I was 11 years old) was a hex editor for the commodore 64. The one I added to toybox was pretty much a practice non-curses app doing ascii escapes to move around the screen before I tried to implement top. Cursoring over into the right side of the screen and typing into there was a thing I considered, but didn't bother to implement at the time because it raises design questions (if I hit "q" or "esc" while the cursor is over there does it quit like normal or does it write the character into that space? Do I implement a modal edit/navigate mode for that/)

I'm all for extending it, but it's not based on any other hex editors. And the big design decision I made was to mmap() the underlying file instead of reading it into memory (to avoid size limitations), which makes inserting into the file... kind of difficult. (Then again on a 32 bit OS it limits the editable size to what fits in the available address space.) Also, inserting into an archive format like ELF or tar will screw up offsets stored in metadata elsewhere...

michael105 commented 3 years ago

Would be eventually possible and also useful to use a temporary file. Maybe store all insertions and deletions in the history, doing the real work on save. Albite I've to admit, that touches quite different usecases and architectures of a Hex editor. Having the possibility of modifying e.g. addresses in elf files accordingly to changes would be really great, but I guess might be out of the target of a tiny tool.. Also thinking of position dependent executables.

enh-google commented 3 years ago

Cursoring over into the right side of the screen and typing into there was a thing I considered

i think using tab to switch back and forth is more common ... but that still has the same problem for entering '\t'. (that seems to be the convention with the linked-to hexedit too, and they use ctrl-q to quote the next character. as a vi user i'd have guessed ctrl-v.)

as someone who's usually in a hex editor to corrupt files though (for interesting test cases), i can't imagine using editing on the ASCII side well enough to have a good feel for what makes sense there. (and it does seem like there are a few questions: does typing tab even matter, or is it fine to be dumped back into the hex side, type 09 and hit tab to go back to ASCII? should the various commands work in both modes or require that you're in hex mode? should typing ή insert two bytes? ...)

and i'm usually trying to subtly corrupt files in some specific way --- i'm struggling to imagine the use for insert mode; pretty much no kind of file i deal with can easily recover from an insertion.

enh-google commented 3 years ago

an alternative that came to me was to not actually let you switch mode at all, but to have ^Q or something mean "the next thing i type is a byte to be inserted literally". still leaves questions like whether ή inserts two bytes, but it does make other things simpler. i just don't know whether people just want a convenient way to not have to look up whether 'M' is 0x4d or 0x4e, or actually want to be able to type. (because while this makes entering 'M' as easy as ^Q M, entering 'monkey' is the rather more awkward ^Q M ^Q O ^Q N ^Q K ^Q E ^Q Y.)

paulwratt commented 3 years ago

Maybe part of the issue here is that traditional hex edit layout mentioned. I dont see why ^Q (eg) cant toggle entry type (Hex<=>Ascii/Unicode) where \t etc are also valid. Maybe ^+key for a single character, and ^+SHIFT+key for toggle. That n chatacter should/would depend on local characterset (ISO/locale) would it not, how many current (modern?) systems are not Unicode on the console atm. My current (since 2012) consoles are unicode aware, but I still use an ISO charcaterset (which is the standard for most distros)

I like the idea of an mmap file, even the use of a temp version to provide for inserts, especially if enhanced with some offset adjustment logic, but then it occured to me, inserting into a 4Gb file would mean a loss of 4Gb of the filesystem (even temporarily), and what if I'm editing an m68k or armv6 binary which are big-endian, you and the logic would have to know that somehow, not too mention inserting into an 32bit elf on a 64bit system (and vice-versa).

landley commented 3 years ago

I mentioned in http://lists.landley.net/pipermail/toybox-landley.net/2021-April/012404.html why the ascii side of the display can't easily do unicode or utf8 expansion. (because 2 utf8 bytes store 11 bits and I can't think of an encoding to fit "U+3F8" into 2 characters). Unfortunately, that message triggered the rabid gmail filters (and dreamhost's allergy to same), and unsubscribed a dozen list members so I dunno who saw it. :P

That message also talks about the interaction of open source development with aesthetic issues (and links to talks/writings of mine on the subject, plus some other people's writings on the underlying issue: tl;dr bikeshedding is a thing). I've occasionally seen these kinds of issues usefully resolved in person between 2-3 people who know each other well at a restaurant or coffee shop, but not with larger groups, other settings, or less familiarity.

It's using mmap now: that's how toybox hexedit works and the only way it currently works. It can't edit a file it can't mmap. The logical way to insert into such a file would be to truncate() it slightly longer, mremap() the file, and copy the data with memmove, which is slow and not gracefully interruptible. Except the main reason I have an mremap TODO item for hexedit would be to allow an editing "window" into the file to move along and only map in the visible area, thus eliminating the 4 gigabyte limit on 32 bit platforms. (After which insert or delete would require a function to map along the file in a loop, copying data in place. Not impossible, just "ow".)

And no, hexedit wouldn't care anything about endianness, the person editing the file has to know what the data means.

There are tools like objcopy/ar/zip that understand an existing file format and can perform transforms on it to create deriviative works; hexedit ain't that. (Excuse me: hexedit am not that.) It's a flathead screwdriver being used as a chisel and prybar. It is a tool in the "stone knives and bearskins" genre, and attempting to make it "smarter" only takes it away from that local peak. A hex editor exists for the times you need to hit the file with a rock. It's NEVER supposed to know what the file it's editing "means".

Which is another reason I've avoided insert/delete because in most binary file formats it's a footgun. As soon as you insert, all the offsets in the file are now wrong. There are times (and file formats) it's useful to do (if you tweak an uncompressed tar you can fix up the header by hand), but that's an exposed sharp edge right there. NOT something you'd want to let them do by accident.