I wrote a serde implementation of the X12 file format, which is a enterprise-y text file format somewhat similar to CSVs. A document has a header, footer, and a set of transactions. Transactions contain segments, segments contain elements, and elements can be single-valued or be a composite of multiple values.
Deserializing into a generic Document (which is a vec of segments, which themselves are vecs of elements, which is an enum of values or a composite vec of values) works great. Each segment begins with a "header" element, so a DMG segment has different semantic meaning from a REF segment and also contains different quantities and types of fields, and so on.
Within a document, some segments are required, some can be repeated, so a Vec<DmgSegment>, and some are optional, Option<DmgSegment>.
The Problem
To deserialize into transactions with semantic meaning, ones that contain fields like Vec<DmgSegment>, the deserializer needs to be able to backtrack and stop visiting if it's expecting a different segment (e.g. expects a DmgSegment but encounters a REF header.) For a transaction with an Option<DmgSegment> segment, the visitor don't know if the transaction contains that record until we see that DMG header element.
To implement this, I have deserialize_struct implemented, which calls visit_seq. Because visit_seq can return actual values like Ok(None) for the next element, I can catch an UnexpectedSegment error, and in doing so, implement backtracking:
pub struct BufferDeserializer<'de> {
pub(crate) buffer: Option<&'de str>,
pub(crate) formatter: X12Formatter,
pub(crate) delimiter: u8,
}
impl<'de, 'a> Deserializer<'de> for &'a mut BufferDeserializer<'de> {
fn deserialize_struct<V>(mut self, name: &'static str, fields: &'static [&'static str], visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de> {
let Some(buf) = self.buffer else {
return Err(X12DeserializerError::EarlyEndOfDocument);
};
let (el, rest) = split_string(buf, self.delimiter as char);
self.buffer = rest;
visitor.visit_seq(SizedBufferDeserializer {
de: &mut BufferDeserializer {
buffer: Some(el),
delimiter: self.formatter.next_delimiter(self.delimiter),
formatter: self.formatter,
},
remaining: fields.len(),
})
}
}
impl<'de> SeqAccess<'de> for BufferDeserializer<'de> {
type Error = X12DeserializerError;
fn next_element_seed<T>(&mut self, seed: T) -> Result<Option<T::Value>, Self::Error> where T: DeserializeSeed<'de> {
if self.buffer.is_none() {
return Ok(None);
}
match seed.deserialize(&mut *self).map(Some) {
Ok(value) => Ok(value),
Err(X12DeserializerError::EarlyEndOfDocument) => Ok(None),
// this is effectively where backtracking happens
Err(X12DeserializerError::UnexpectedSegment { .. }) => Ok(None),
Err(e) => Err(e),
}
}
}
The above code works great.
However, in the deserialize_option method, the only way to return the equivalent of Ok(None) is by calling visit_none. But we don't know whether it's Some or None valued until we do the visiting. So this code is broken:
// This code fails to compile
fn deserialize_option<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de> {
let mut s = visitor.visit_some(self);
if s.is_ok() {
s
} else {
// EarlyEndOfDocumentError or a UnexpectedSegment error
// Error is the next line: we can't call `visit_none` because both visit_none and visit_some take ownership of visitor.
visitor.visit_none() // Compiler error
}
}
I tried using visit_seq from deserialize_option, but that fails with an unexpected type error - apparently serde doesn't treat Option as a seq of length zero or one.
Hacky Solution
I hacked a solution using unsafe, which only works because every visitor I've seen is a ZST. Unfortunately, I don't think there's a way to declare to the compiler that a generic V is ZST.
// this code actually works
fn deserialize_option<V>(self, visitor: V) -> Result<V::Value, Self::Error> where V: Visitor<'de> {
// this only works because all the visitors ive seen are ZSTs
// if they did have state, this would be a disaster
let insanely_unsafe_copy_of_visitor = unsafe {
std::ptr::read(&visitor as *const V)
};
let mut s = visitor.visit_some(self);
if s.is_ok() {
s
} else {
insanely_unsafe_copy_of_visitor.visit_none()
}
}
The Question
Is there a better way to solve this problem, not using unsafe? Basically, I need a way to return None from deserialize_option, similar to how I'm able to in SeqAccess::next_element_seed.
The most straight forward & backwards compatible way that I see would be if deserialize_option could call visit_seq - perhaps guaranteeing that it's a seq of max one length. But that feels like more indirection than if deserialize_option could return None somehow.
I wrote a
serde
implementation of the X12 file format, which is a enterprise-y text file format somewhat similar to CSVs. A document has a header, footer, and a set of transactions. Transactions contain segments, segments contain elements, and elements can be single-valued or be a composite of multiple values.Deserializing into a generic
Document
(which is a vec of segments, which themselves are vecs of elements, which is an enum of values or a composite vec of values) works great. Each segment begins with a "header" element, so aDMG
segment has different semantic meaning from aREF
segment and also contains different quantities and types of fields, and so on.Within a document, some segments are required, some can be repeated, so a
Vec<DmgSegment>
, and some are optional,Option<DmgSegment>
.The Problem
To deserialize into transactions with semantic meaning, ones that contain fields like
Vec<DmgSegment>
, the deserializer needs to be able to backtrack and stop visiting if it's expecting a different segment (e.g. expects a DmgSegment but encounters aREF
header.) For a transaction with anOption<DmgSegment>
segment, the visitor don't know if the transaction contains that record until we see thatDMG
header element.To implement this, I have
deserialize_struct
implemented, which callsvisit_seq
. Becausevisit_seq
can return actual values likeOk(None)
for the next element, I can catch anUnexpectedSegment
error, and in doing so, implement backtracking:The above code works great.
However, in the
deserialize_option
method, the only way to return the equivalent ofOk(None)
is by callingvisit_none
. But we don't know whether it'sSome
orNone
valued until we do the visiting. So this code is broken:I tried using
visit_seq
fromdeserialize_option
, but that fails with an unexpected type error - apparently serde doesn't treat Option as a seq of length zero or one.Hacky Solution
I hacked a solution using unsafe, which only works because every visitor I've seen is a ZST. Unfortunately, I don't think there's a way to declare to the compiler that a generic
V
is ZST.The Question
Is there a better way to solve this problem, not using unsafe? Basically, I need a way to return
None
from deserialize_option, similar to how I'm able to inSeqAccess::next_element_seed
.The most straight forward & backwards compatible way that I see would be if
deserialize_option
could callvisit_seq
- perhaps guaranteeing that it's a seq of max one length. But that feels like more indirection than ifdeserialize_option
could returnNone
somehow.This issue supercedes #2705.