File parser breaks with improper files

I've added some files to the parse_spec test cases because they broke my file parser.

So, I'm looking at improving the parser.

The data looks like X.c.a.l.i.b.u.r when viewed in an hex editor. Essentially, hex 00 lies between each character code.

I should be able to find an encoding that makes my parsing easier than just the raw binary. To that end, I've looked at UTF-32.

It doesn't seem to explain the behaviors.

 1.9.3-p0 :022 > data = IO.read(File.open("spec/tfiles/matt.sld", 'rb:UTF-32')).unpack("C100").map(&:chr)

=> ["\x01", "\xA1", "F", "\x00", "i", "\x00", "n", "\x00", "n", "\x00", "i", "\x00", "g", "\x00", "a", "\x00", "n", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x02", "\x00", "@", "\x00", "\x00", "\x00", "\xF0", "\x81", "\xE2", "\xBC", "\x13", "\x9E", "\xCC", "\x01", "X", "\x00", "c", "\x00", "a", "\x00", "l", "\x00", "i", "\x00", "b", "\x00", "u", "\x00", "r", "\x00", "_", "\x00", "S", "\x00", "y", "\x00", "s", "\x00", "t", "\x00", "e", "\x00", "m", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "X", "\x00"]

This leaves me with no encoding ideas right now, and furthermore, remaining problems with these files breaking my parser.

princelab / metriculator

File parser breaks with improper files #8