Closed zsau closed 10 years ago
Sure, sounds useful. Do you know about a second example where this kind of feature might be useful? I can't think of something different from null-terminated strings?
Else I would lean to add a specific coded for this case, like
(defn c-string
"Zero-terminated string (like in C). String is a sequence of bytes, terminated by a 0 byte."
[^String encoding]
(reify BinaryIO
(read-data [_ big-in _]
(loop [bytes (transient [])]
(let [b (.readByte ^DataInput big-in)]
(if (zero? b)
(String. (byte-array (persistent! bytes)) encoding)
(recur (conj! bytes b))))))
(write-data [_ big-out _ s]
(.write ^DataOutput big-out (.getBytes ^String s))
(.write ^DataOutput big-out (byte 0)))))
Some codecs use null characters as terminators rather than null bytes (ID3v2, for example). Unfortunately the null character isn't always one byte-- it's two bytes in UTF-16, for example.
Thanks, this is very helpful. I notice that when decoding with something like (repeated (string "UTF-8" :separator 0)), any bytes that come after the last null in the stream are read and ignored by the parser. Is there any way to access those bytes?
So you say your codec keeps reading bytes after encountering the separator? Maybe you keep calling decode
? I added a test that reads bytes separated by a null byte and verifies, if after each read the rest of the bytes in the stream are untouched. If you could give an example for buggy behaviour please file a new bug report.
Some codecs use null-terminated strings whose length isn't known in advance, which is very awkward to parse at the moment. An optional
:suffix
or:terminator
argument tostring
and/orrepeated
would be very useful.