princelab / metriculator

Performance metric tool for LCMS and MS experiments.
metriculator.chem.byu.edu
MIT License
3 stars 1 forks source link

File parser breaks with improper files #8

Open ryanmt opened 12 years ago

ryanmt commented 12 years ago

I've added some files to the parse_spec test cases because they broke my file parser.

So, I'm looking at improving the parser.

The data looks like X.c.a.l.i.b.u.r when viewed in an hex editor. Essentially, hex 00 lies between each character code.

I should be able to find an encoding that makes my parsing easier than just the raw binary. To that end, I've looked at UTF-32.

It doesn't seem to explain the behaviors.

 1.9.3-p0 :022 > data = IO.read(File.open("spec/tfiles/matt.sld", 'rb:UTF-32')).unpack("C100").map(&:chr)

=> ["\x01", "\xA1", "F", "\x00", "i", "\x00", "n", "\x00", "n", "\x00", "i", "\x00", "g", "\x00", "a", "\x00", "n", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x02", "\x00", "@", "\x00", "\x00", "\x00", "\xF0", "\x81", "\xE2", "\xBC", "\x13", "\x9E", "\xCC", "\x01", "X", "\x00", "c", "\x00", "a", "\x00", "l", "\x00", "i", "\x00", "b", "\x00", "u", "\x00", "r", "\x00", "_", "\x00", "S", "\x00", "y", "\x00", "s", "\x00", "t", "\x00", "e", "\x00", "m", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "\x00", "X", "\x00"]

This leaves me with no encoding ideas right now, and furthermore, remaining problems with these files breaking my parser.

ryanmt commented 12 years ago

I've written some catches into the system. Now, errors won't be thrown, and sld rows contain a 'parsed' setting tag which regulates if they are used later. This is not ideal, but seems to work for the time being.