Open armijnhemel opened 1 year ago
@armijnhemel:
and the default encoding is UTF-8, so this will obviously not work. I don't know how I could fix this.
Well, if there isn't a single character encoding we could specify in the .ksy
, the "next best thing" is to downgrade to a byte array:
record_type_string_array:
params:
- id: num_values
type: u4
seq:
- id: values
- type: strz
+ terminator: 0
repeat: expr
repeat-expr: num_values
A byte array is the implicit type in .ksy
specs when no type
is given but the field size is delimited by size
, size-eos: true
or terminator
.
@armijnhemel:
and the default encoding is UTF-8, so this will obviously not work. I don't know how I could fix this.
Well, if there's no one clear character encoding we could specify in the
.ksy
, the "next best thing" is to downgrade to a byte array:record_type_string_array: params: - id: num_values type: u4 seq: - id: values - type: strz + terminator: 0 repeat: expr repeat-expr: num_values
A byte array is the implicit type in
.ksy
specs when notype
is given but the field size is delimited bysize
,size-eos: true
orterminator
.
I actually had been thinking about that and looked at the docs, but that seems to indicate that terminator
was only for strings. Using a byte array and then processing the strings in an external script would work for me.
@armijnhemel:
and the default encoding is UTF-8, so this will obviously not work. I don't know how I could fix this.
Well, if there's no one clear character encoding we could specify in the
.ksy
, the "next best thing" is to downgrade to a byte array:record_type_string_array: params: - id: num_values type: u4 seq: - id: values - type: strz + terminator: 0 repeat: expr repeat-expr: num_values
A byte array is the implicit type in
.ksy
specs when notype
is given but the field size is delimited bysize
,size-eos: true
orterminator
.I actually had been thinking about that and looked at the docs, but that seems to indicate that
terminator
was only for strings. Using a byte array and then processing the strings in an external script would work for me.
Thinking a bit more about this: probably this isn't a good idea, as \x00
can be part of a valid UTF-8 string.
I found it easier to just work around it like this:
This is cleaner than trying to fix it here.
In the current
rpm.ksy
theencoding
for strings is set to UTF-8. There are RPM files that fail to parse, because as it turns out not everyone has been playing nice with encodings.An example is this file from Fedora Core 3:
https://archives.fedoraproject.org/pub/archive/fedora/linux/core/3/x86_64/os/Fedora/RPMS/bash-3.0-17.x86_64.rpm
One of the tags is a
record_type_string_array
related to ChangeLogs and some people seem to have used Latin-1 characters instead.Currently
record_type_string_array
is defined as follows:and the default encoding is UTF-8, so this will obviously not work. I don't know how I could fix this.