nramos0 / binhex4-rs

A BinHex 4.0 encoder and decoder implementation for Rust.
3 stars 0 forks source link

Failed to decode my known-good test file #2

Closed ssokolow closed 2 years ago

ssokolow commented 2 years ago

This minimal test program...

use std::path::PathBuf;

use binhex4::decode::hexbin;
use gumdrop::Options;

#[derive(Options)]
struct Opts {
    #[options(free, help = "BinHex files to attempt to decode")]
    input: Vec<PathBuf>,

    #[options(help = "print help message")]
    help: bool,
}

fn main() {
    let opts = Opts::parse_args_default_or_exit();

    for path in opts.input {
        let bytes = match std::fs::read(&path) {
            Ok(x) => x,
            Err(e) => {
                println!("Failed to read {}: {}", path.display(), e);
                continue;
            }
        };
        hexbin(&bytes, true).unwrap();
    }
}

...reports BadFormat...

ssokolow@monolith binhex_test [master] % cargo run testfile.txt.good.hqx
    Finished dev [optimized] target(s) in 0.02s
     Running `target/debug/binhex_test testfile.txt.good.hqx`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: BadFormat', src/main.rs:26:30
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

...when fed this minimal "known good case" test file that Python's standard library binhex module has no problem with and uudeview produces expected output from:

(This file must be converted; you knew that already.)

:$(4PFh4QD@aP,R4iG!"849K868&$33#3"3`!!!!!RTK8CA0dD@jR)$%b-`S6a3!
!:

(Yeah, you don't need to binhex text files, but I built the test file from testfile.txt because it makes it easier to quickly check for problems using less, and because text files have no internal checksums that a tool might leverage to cover up its lack of support for checking BinHex4's own CRC.)

ssokolow commented 2 years ago

Hmm.

I find it quite amusing that the very first test file I fed through it appears to be an edge case that I can't figure out how I created in the first place.

Maybe I'll find time to set up gdbgui some time in the next couple of days and step through it to see what's going on... or maybe I'll just set up the differential fuzzing against the Python implementation that I was planning to do anyway (using PyO3 to get the Python and Rust in the same process, if you want to leapfrog me on that) and see if that removes the need for gdbgui to figure out where they diverge.

nramos0 commented 2 years ago

Thanks for bringing up this issue! I think I know the issue here, binhex4 doesn't like anything that doesn't contain a string starting with "(This file must be converted with BinHex". The general parsing format is

*
BINHEX_PROMPT_PREFIX
*
:DATA:
*

where BINHEX_PROMPT_PREFIX is "(This file must be converted with BinHex" and DATA is any number of lines of binhex data. I interpreted the spec you posted to mean that this string prefix is a hard requirement, but if it is true that like in this case many files don't comply with that, I could remove this requirement when parsing.

nramos0 commented 2 years ago

I can verify that your file correctly decodes to

Testing 123

when I change it to

(This file must be converted with BinHex 4.0)

:$(4PFh4QD@aP,R4iG!"849K868&$33#3"3`!!!!!RTK8CA0dD@jR)$%b-`S6a3!
!:
ssokolow commented 2 years ago

They don't appear to be common, but they exist (a quick search through the .hqx files I happen to have sitting on my hard drive revealed three out of 836 lacking the substring BinHex) and, given the context in which such things were used, you generally want to be lenient in your parsing.

BinHex was basically a Macintosh counterpart to things like uuencode and, depending on the nature of your early Usenet client, it'd be easy for someone to assume that message isn't significant to the parsing. You might literally be copy-pasting the non-human-readable portion of the BinHex file into a message after you wrote something like "Here's the .hqx".

(It's reminiscent of how early file transfers over serial lines and modems worked, with the receiver running something like XModem and the sender then running it, and, as far as the link was concerned, the human on one end just started typing really fast, ending with the program on the receiving end exiting on its own. Very ad-hoc, lenient, and lacking formal framing.)

EDIT: Watch Cathode Ray Dude's AT&T's '60s Modem That Won't Die if you want an interesting exploration of how ad-hoc it began.

nramos0 commented 2 years ago

I'd be interested in seeing the result of the fuzzing! If you have a python fuzzing implementation setup, I could also look at integrating it with this package / if you wanted to do that that would be appreciated. As for the parsing, thanks for the context, I will remove the requirement and just read it like *:data:*

ssokolow commented 2 years ago

Barring unanticipated show-stoppers, it'll be a Rust fuzzing system which binds to libpython via PyO3 to embed a Python runtime, since I prefer the experience of tools like cargo-fuzz to what's available on the Python side, but sure.

I may get to it today or it may not. I'm trying to get my sleep cycle in order and it'll depend on how much I get to before I feel sleepy.

nramos0 commented 2 years ago

Sounds good, no rush. I'll close this issue for now since I've just fixed this issue and updated it to v0.1.2. let me know if there are any other issues you run into with the library