Closed ssokolow closed 2 years ago
Hmm.
binhex.binhex("Cargo.toml", "Cargo.toml.hqx")
\n
line endings and binhex.binhex
produces output with Mac \r
line endings.)hexbin(&bytes, true).unwrap();
to hexbin(&bytes, false).unwrap();
has no effect.I find it quite amusing that the very first test file I fed through it appears to be an edge case that I can't figure out how I created in the first place.
Maybe I'll find time to set up gdbgui some time in the next couple of days and step through it to see what's going on... or maybe I'll just set up the differential fuzzing against the Python implementation that I was planning to do anyway (using PyO3 to get the Python and Rust in the same process, if you want to leapfrog me on that) and see if that removes the need for gdbgui to figure out where they diverge.
Thanks for bringing up this issue! I think I know the issue here, binhex4
doesn't like anything that doesn't contain a string starting with "(This file must be converted with BinHex". The general parsing format is
*
BINHEX_PROMPT_PREFIX
*
:DATA:
*
where BINHEX_PROMPT_PREFIX is "(This file must be converted with BinHex" and DATA is any number of lines of binhex data. I interpreted the spec you posted to mean that this string prefix is a hard requirement, but if it is true that like in this case many files don't comply with that, I could remove this requirement when parsing.
I can verify that your file correctly decodes to
Testing 123
when I change it to
(This file must be converted with BinHex 4.0)
:$(4PFh4QD@aP,R4iG!"849K868&$33#3"3`!!!!!RTK8CA0dD@jR)$%b-`S6a3!
!:
They don't appear to be common, but they exist (a quick search through the .hqx
files I happen to have sitting on my hard drive revealed three out of 836 lacking the substring BinHex
) and, given the context in which such things were used, you generally want to be lenient in your parsing.
BinHex was basically a Macintosh counterpart to things like uuencode and, depending on the nature of your early Usenet client, it'd be easy for someone to assume that message isn't significant to the parsing. You might literally be copy-pasting the non-human-readable portion of the BinHex file into a message after you wrote something like "Here's the .hqx".
(It's reminiscent of how early file transfers over serial lines and modems worked, with the receiver running something like XModem and the sender then running it, and, as far as the link was concerned, the human on one end just started typing really fast, ending with the program on the receiving end exiting on its own. Very ad-hoc, lenient, and lacking formal framing.)
EDIT: Watch Cathode Ray Dude's AT&T's '60s Modem That Won't Die if you want an interesting exploration of how ad-hoc it began.
I'd be interested in seeing the result of the fuzzing! If you have a python fuzzing implementation setup, I could also look at integrating it with this package / if you wanted to do that that would be appreciated. As for the parsing, thanks for the context, I will remove the requirement and just read it like *:data:*
Barring unanticipated show-stoppers, it'll be a Rust fuzzing system which binds to libpython
via PyO3 to embed a Python runtime, since I prefer the experience of tools like cargo-fuzz to what's available on the Python side, but sure.
I may get to it today or it may not. I'm trying to get my sleep cycle in order and it'll depend on how much I get to before I feel sleepy.
Sounds good, no rush. I'll close this issue for now since I've just fixed this issue and updated it to v0.1.2. let me know if there are any other issues you run into with the library
This minimal test program...
...reports
BadFormat
......when fed this minimal "known good case" test file that Python's standard library
binhex
module has no problem with anduudeview
produces expected output from:(Yeah, you don't need to binhex text files, but I built the test file from
testfile.txt
because it makes it easier to quickly check for problems usingless
, and because text files have no internal checksums that a tool might leverage to cover up its lack of support for checking BinHex4's own CRC.)