Open ross-spencer opened 1 year ago
@ross-spencer, thanks for reporting this!
It looks like in this specific example, the caveat is with the fact that Ruby UTF-8 parsing is quite "forgiving" in terms of getting byte sequences which are not valid UTF-8 characters as strings, e.g.
[.] magic = "\xBB\r\neyeglass\u001A\n\xAB" [.] eof = "\xBBeof"
This backfires when trying to use standard YAML or JSON libraries trying to dump these, though, as they give up.
To be honest, I'm not sure what behavior we'd want in this case. Probably reporting a more precise error should be a good idea (e.g. down to a specific attribute path). Showing that string is not a valid UTF-8 in ksv might be a good idea too, but then ksv is supposed to be "all forgiving", and, it actually is right now.
The simplest fix on .ksy side would be removing type: str
and encoding: utf-8
. This yields raw bytes rendition:
axis_right_left:
- 130
- 80
base_right_left:
- 0.0
- 0.0
cylinder_right_left:
- -0.25
- -1.0
datetime: '2012-11-08T12:37:50'
distance_acuity_right_left:
- 0.6600000262260437
- 0.5
endianness: 1
eof: BB 65 6F 66
format_expansion_room: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00
magic: BB 0D 0A 65 79 65 67 6C 61 73 73 1A 0A AB
near_acuity_right_left:
- 12
- 12
next_checkup_years: 1.0
observation9: "Patient's eyesight needs correction. History of diabetes in family
but indicators found. Standard checkup interval recommended.\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
prism_right_left:
- 0.0
- 0.0
purpose: "Distance and Close Work.\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
sphere_right_left:
- -3.3499999046325684
- 0.5
version: '01'
From ksdump's side, I can propose several solutions. While doing recursive traversal, check for string validity (e.g. value.valid_encoding?
). Then, if we detect the problem:
Thoughts?
@GreyCat thanks for that. I see where my assumptions were wrong now. I took your approach to remove the offending str
and encoding, but then decided I could get a lot of value out of turning the magic into a type which gives me some more granular output. I'll need to see what that does for me if using kaitai to auto-generate parsing code.
ksdump has been useful for debugging here, I'll reflect on that in my writeup. From my perspective, out of the potential changes, then:
Note where this is in the tree and report the error, pointing to that location
Would be a useful starting place, as for string substitution, or dumping raw bytes, then they seem nice options, but providing they didn't obscure the user error in any way, i.e. it was still clear that the ksy
file contains a potentially invalid instruction/the input file contains some unexpected data.
I an seeing ksdump crash with the following:
Example file.
Example ksy.
Via: https://github.com/ross-spencer/eyeglass/tree/main
I'm not clear on if this is a user error, or if there's a newer version of ksdump I should be looking at?