Closed vmx closed 2 years ago
I think that's a bit beyond the scope of this crate.
Can this be done outside of this crate? we could have a serde-cbor4ii-canonical
crate that wraps the Serializer and overrides the collect_map
method.
like
struct CanonicalSerializer<W>(Cbor4iiSerializer<W>);
impl<W: Write> Serializer for &mut CanonicalSerializer<W> {
type SerializeSeq = <Cbor4iiSerializer<W> as Serializer>::SerializeSeq;
// forward to Cbor4iiSerializer
fn collect_map<K, V, I>(self, iter: I) -> Result<(), Self::Error> {
// your code
}
}
I think that's a bit beyond the scope of this crate.
That's fine. But it brings me to the next point (let me know if I should open a separate issue). As you can see on that implementation, I need access to some things (like TypeNum
) which isn't public yet. What I think would be great if cbor4ii could be split into two crates, one core one and one for Serde (similar to what ciborium is doing. This way I could depend on the core crate, but create my own custom Serde implementation. That would help me a lot.
cbor4ii
exposes MapStartBounded, you don't have to access TypeNum
, You can notice that TypeNum
is also not accessed in the serde
mod.
also if you don't turn on the serde1
feature then it's a core
crate and it doesn't need to split out another crate.
The thing I can think of that prevents CanonicalSerializer
from being impl externally is that Serializer
might need to expose an as_mut
to allow CanonicalSerializer
to bypass the serde api for writes.
cbor4ii
exposes MapStartBounded, you don't have to accessTypeNum
Thanks! I missed that one, that is exactly what I needed.
also if you don't turn on the
serde1
feature then it's acore
crate and it doesn't need to split out another crate.
Oh right, I should've thought of this, I'll just do that.
The thing I can think of that prevents
CanonicalSerializer
from being impl externally is thatSerializer
might need to expose anas_mut
to allowCanonicalSerializer
to bypass the serde api for writes.
As I mentioned, I need a few other changes that are not really general purpose (like this sort order), hence don't make sense to be upstreamed/published in a generic way. So I thought I just fork the whole Serde part and copy it over. Though the idea of wrapping the serializer sounds interesting. I'll have a look and come back to you in case I need additional things exposed.
One more thing in regards to the original issue. I find it quite useful if you're able to serialize things into memory. In the above code I've implemented impl enc::Write for Vec<u8>
. I've also seen that some tests of this library are using a BufWriter
. It would be cool to have this kind of functionality (in whichever way) in the public API of the core library, so that it can be re-used.
I didn't initially think about how to expose them, so I chose not to. Currently you can implement enc::Write
for your own types.
I would consider adding a mod that provides these helper type.
I would consider adding a mod that provides these helper type.
I find the impl enc::Write for Vec<u8>
quite convenient to use. Should I create a PR implementing this? Or should I create a PR exposing BufWriter
or should I just wait?
I would like to expose a BufWriter
type, it would be nice if you were willing to open a PR.
Thanks a lot for all the help. I think I can now solely rely on the core features (without serde1
) and do my own Serde implementation (I still want to experiment with wrapping yours, but that's not related to this issue.
For those who need this kind of collation, this is the version updated for cbor4ii 0.2.20:
fn collect_map<K, V, I>(self, iter: I) -> Result<(), Self::Error>
where
K: ser::Serialize,
V: ser::Serialize,
I: IntoIterator<Item = (K, V)>,
{
use serde::ser::SerializeMap;
#[cfg(not(feature = "use_std"))]
use crate::alloc::vec::Vec;
use crate::core::utils::BufWriter;
// CBOR RFC-7049 specifies a canonical sort order, where keys are sorted by length first.
// This was later revised with RFC-8949, but we need to stick to the original order to stay
// compatible with existing data.
// We first serialize each map entry into a buffer and then sort those buffers. Byte-wise
// comparison gives us the right order as keys in DAG-CBOR are always strings and prefixed
// with the length. Once sorted they are written to the actual output.
let mut buffer = BufWriter::new(Vec::new());
let mut mem_serializer = Serializer::new(&mut buffer);
let mut serializer = Collect {
bounded: true,
ser: &mut mem_serializer,
};
let mut entries = Vec::new();
for (key, value) in iter {
serializer.serialize_entry(&key, &value)
.map_err(|_| enc::Error::Msg("Map entry cannot be serialized.".into()))?;
entries.push(serializer.ser.writer.buffer().to_vec());
serializer.ser.writer.clear();
}
enc::MapStartBounded(entries.len()).encode(&mut self.writer)?;
entries.sort_unstable();
for entry in entries {
self.writer.push(&entry)?;
}
Ok(())
}
I re-open this issue as I'm coming back to your proposal from https://github.com/quininer/cbor4ii/issues/13#issuecomment-1088934557, where I try to wrap your serializer.
It almost works. The problem is the serializer to serialize the map. In the code from the comment above, I use:
let mut serializer = Collect {
bounded: true,
ser: &mut mem_serializer.0,
};
But currently Collect
isn't public (with good reason).
I then tried it with:
let mut serializer = Self::SerializeMap {
bounded: true,
ser: &mut mem_serializer.0,
};
But that errors with:
error[E0308]: mismatched types
--> src/ser.rs:290:16
|
78 | impl<'a, W: enc::Write> ser::Serializer for &'a mut Serializer<W> {
| - this type parameter
...
290 | ser: &mut mem_serializer.0,
| ^^^^^^^^^^^^^^^^^^^^^ expected type parameter `W`, found `&mut cbor4ii::core::utils::BufWriter`
|
= note: expected mutable reference `&mut cbor4ii::serde::Serializer<W>`
found mutable reference `&mut cbor4ii::serde::Serializer<&mut cbor4ii::core::utils::BufWriter>`
Next try (thanks to the help from a colleague) was:
let mut serializer = <&mut Serializer<_> as serde::Serializer>::SerializeMap {
bounded: true,
ser: &mut mem_serializer.0,
};
That would work, but sadly only on nightly due to https://github.com/rust-lang/rust/issues/86935.
So what to do now? Is making Collect
public an option (it would also need to have constructor then)?
We should be able to implement a separate Collect
for this with just a few lines of code.
or we just need to serialize these two objects into the same buffer, it doesn't necessarily need SerializeMap
.
like
let mut buffer = BufWriter::new(Vec::new());
let mut entries = Vec::new();
for (key, value) in iter {
let mut mem_serializer = Serializer::new(&mut buffer);
key.serialize(&mut mem_serializer)?;
value.serialize(&mut mem_serializer)?;
entries.push(buffer.buffer().to_vec());
buffer.clear();
}
or we just need to serialize these two objects into the same buffer, it doesn't necessarily need
SerializeMap
.
Wow, that's even better. I've been spending so much time on Serde, but I'm still not good at it. Thanks a lot this works! I just need to look into some error handling issues, but that should be easy.
Here comes the full source of the updated version:
fn collect_map<K, V, I>(self, iter: I) -> Result<(), Self::Error>
where
K: ser::Serialize,
V: ser::Serialize,
I: IntoIterator<Item = (K, V)>,
{
// CBOR RFC-7049 specifies a canonical sort order, where keys are sorted by length first.
// This was later revised with RFC-8949, but we need to stick to the original order to stay
// compatible with existing data.
// We first serialize each map entry into a buffer and then sort those buffers. Byte-wise
// comparison gives us the right order as keys in DAG-CBOR are always strings and prefixed
// with the length. Once sorted they are written to the actual output.
let mut buffer = BufWriter::new(Vec::new());
let mut entries = Vec::new();
for (key, value) in iter {
let mut mem_serializer = Serializer::new(&mut buffer);
key.serialize(&mut mem_serializer)
.map_err(|_| enc::Error::Msg("Map key cannot be serialized.".into()))?;
value
.serialize(&mut mem_serializer)
.map_err(|_| enc::Error::Msg("Map key cannot be serialized.".into()))?;
entries.push(buffer.buffer().to_vec());
buffer.clear();
}
enc::MapStartBounded(entries.len()).encode(&mut self.0.writer())?;
entries.sort_unstable();
for entry in entries {
self.0.writer().push(&entry)?;
}
Ok(())
}
My Serializer
looks like this:
pub struct Serializer<W>(cbor4ii::serde::Serializer<W>);
impl<W> Serializer<W> {
pub fn new(writer: W) -> Serializer<W> {
Serializer(Cbor4iiSerializer::new(writer))
}
}
This library explicitly specifies RF-8949, so this request might be out of scope.
In the project I want to use cbor4ii for I'm stuck with RFC-7049 Canonical CBOR key ordering. This means that keys are sorted by their length first. I wonder if that could perhaps be added behind a feature flag. Here is an implementation the seems to work. I didn't create a PR as this clearly needs more discussion first.
I'd also like to note that I need even more changes for my use case (it's a subset of CBOR), for which I will need to fork this library. Nonetheless I think it would be a useful addition and I'd also prefer if the fork would be as minimal as possible. I thought I bring it up, to make clear that it won't be a showstopper if this change wouldn't be accepted.