Open calops opened 2 years ago
Yes, the reason is because XML is a text-based format which could use many encodings. You should manually encode the string result to bytes for now. If you want an UTF-8 encoded data in Vec<u8>
, it is easily and zero-cost to get it:
let ser = Serializer::new(String::new());
let string: String = data.serialize(ser).unwrap();
let bytes: Vec<u8> = string.into_bytes();
That's a fair point. I have a question though:
I create a Writer
, which requires a Write
argument that implements std::io::Write
. But if I want to pass that writer around and append to what it's doing, I can only call to_writer(writer.inner())
, which won't work, because to_writer
requires std::fmt::Write. Right now I'm having to use an intermediate type that implements the
fmtversion and forwards it to
io` (or vice versa).
I feel like the actual solution to my use-case would be the ability to call the Writer
directly so that it can serialize a random argument I throw at it, without having to go through create_element
, which requires defining a root tag (which I don't want in this context).
You want to mix serde serializer with event-based Writer API? Could you describe your case better? Also, I suppose you use master, right?
Basically, I write a bunch of nodes myself through the event-based API. But then the inner part is serialized through serde
, so I end up calling to_writer(ToFmtWrite(writer.inner()))
in the middle of my events, ToFmtWrite
being a dumb newtype with the correct fmt::Write
implementation for anything that implements io::Write
. I'm using a Vec<u8>
as the buffer for the writer (but I'd use a String
just as well if it meant I don't have to do this Write
trickery).
Being able to replace something like that
writer.write_event(Event::Start(BytesStart::new("foo")))?;
quick_xml::se::to_writer(ToWriteFmt(writer.inner()), &Bar { baz: 42 })?;
writer.write_event(Event::End(BytesEnd::new("foo")))?;
with something like this
writer
.create_element("foo")
.write_serialized_content(Bar { baz: 42 })?;
would be the ideal implementation for my use-case.
And yes, I am using master
.
with something like this
writer .create_element("foo") .write_serialized_content(Bar { baz: 42 })?;
would be the ideal implementation for my use-case.
Yes, I agree that we should have it, but it just not yet implemented. Did you mind to make a PR?
Ideally, we should have a reader, that would be able to deserialize objects using serde
and a writer, that should be able to serialize them, but implementing a deserializer part a much more hard task then the serializer one.
Yeah I can work on a PR for this, I'll get back to you once it's ready.
I just tried to upgrade to 0.27 in zbus and was surprised by this change breaking my code. I know it's a different semver release so breaking changes are to be expected but it would have been nice to find any mention of this change and the rationale in either the release notes, or the relevant commit log (4f376b1e).
In any case, isn't it change making to_writer
inconsistent with both other similar serde APIs and also with the to_reader
, which expects a io::Read
?
In any case, isn't it change making
to_writer
inconsistent with both other similar serde APIs and also with theto_reader
, which expects aio::Read
?
Oops, give the incorrect link to the JSON lib. I meant serde_json of course: https://docs.rs/serde_json/latest/serde_json/fn.to_writer.html
Yes, the reason is because XML is a text-based format
I would think that's true for JSON as well.
which could use many encodings.
Does fmt::Write
help there? AFAIK it's UTF-8 specific. In fact the docs say:
This trait only accepts UTF-8–encoded data ... If you only want to accept Unicode and you don’t need flushing, you should implement this trait; otherwise you should implement std::io::Write.
So that seems like a counter-argument to using fmt::Write
unless I'm missing something? :thinking:
I would think that's true for JSON as well.
Yes, and I think this is oversight in the serde_json. It writes UTF-8 only, which is a task for fmt::Write
.
Our serde writer doesn't write the XML declaration (<?xml ... ?>
) for composability (actually, it writes XML fragments instead of XML documents) and produces only UTF-8 encoded output. This is by intention: if you need to get the result in another encoding, you should encode the result yourself and append a correct XML declaration. Both tasks can be added as helper functions to quick-xml.
Adapter from fmt::Write
to io::Write
performs infallible conversion, but the opposite is not true. So we decided to use more specific trait to write data.
Thanks for explaining.
Yes, and I think this is oversight in the serde_json
Are you sure? Is there an issue on serde_json about this? I'm sorry for being skeptical here but this is @dtolnay (the main person behind a lot of Rust's fundamental API, including serde itself) we're talking about.
Our serde writer doesn't write the XML declaration (
<?xml ... ?>
) for composability (actually, it writes XML fragments instead of XML documents) and produces only UTF-8 encoded output. This is by intention: if you need to get the result in another encoding, you should encode the result yourself and append a correct XML declaration. Both tasks can be added as helper functions to quick-xml.
Yeah but AFAIK Write
is an abstraction for producing the output and should be independent of the format. E.g if I want to write to a file or socket, I'd use the io::Write
implementing types for that (just like I'd use io::Read
for reading from them). If only UTF-8 is supported, then I don't see any problem. If other encodings are planned or expected in the future, then I'd suggest making it configurable via some parameters to to_writer
and from_reader
. I actually implemented such an API (see the ctxt
param).
Also what about the internal inconsistency with from_reader
?
Is there an issue on serde_json about this?
I don't known. While @dtolnay can be great in other projects, serde project is not in him focus nowadays and it is effectively frozen as you can see from numerous problems in serde bugtracker those does not get any attention from maintainers.
Yeah but AFAIK
Write
is an abstraction for producing the output and should be independent of the format.
Yes, it is, but in our case, it is more convenient for us to work internally with a type that protects us from the accidental use of incorrect encoding when writing XML parts (or worse, the use of different encodings when writing different parts). That type is everything that impl fmt::Write
. Since there is no reason to hide this implementation from an external user, it is part of the public API. The presence of an additional API with io::Write
is not prohibited and I will be happy to accept PR, which will add it. I am unlikely to work in this direction, since I do not need to write XML.
Also what about the internal inconsistency with
from_reader
?
Difference from reader side in two points:
fmt::Write
for reader part (but probably we'll switch to an internal reader that will follow that concept in #158)I think io::Write is the better API for to_writer. Pretty near 100% of users of a to_writer API are going to be using it with types that implement io::Write and do not implement fmt::Write. Things like File, BufWriter, BytesMut, Stdout, etc. Forcing near 100% of callers to use a fmt/io adapter would be a worse API regardless of the adapter being zero cost.
Is there an issue on serde_json about this?
I don't known. While @dtolnay can be great in other projects, serde project is not in him focus nowadays and it is effectively frozen as you can see from numerous problems in serde bugtracker those does not get any attention from maintainers.
Yeah but both these traits existed when serde was developed so I'm not sure the current state of serde projects is very relevant here.
The presence of an additional API with
io::Write
is not prohibited and I will be happy to accept PR, which will add it.
If this API could be zero-cost, I wouldn't mind at all and can look into contributing it. However, looking at the implementations in std, I don't see how that would work w/o some external crate or new fmt::Write
impls. :thinking:
I am unlikely to work in this direction, since I do not need to write XML.
That's fair and hope you can agree that it'd be nice not to ditch functionality that could be essential for users.
XML encoding can be detected during reading
Not sure if this difference is relevant? In case of writing, there is no need for detection.
Forcing near 100% of callers to use a fmt/io adapter would be a worse API regardless of the adapter being zero cost.
That adapter could represent essential entity of XML world -- an XML document, which is a container for XML declaration, DTD, used namespaces. Serde part writes only XML fragments which is not a documents, so writing they to file / network directly could be error-prone for reader (if reader expects standalone XML). This is the case where the explicit is better, than implicit.
Yeah but both these traits existed when serde was developed so I'm not sure the current state of serde projects is very relevant here.
It was the answer to why serde_json
shouldn't be seen as the truth of last resort, and why the missing issue in their bug tracker means nothing.
I don't see how that would work w/o some external crate or new fmt::Write impls.
The simplest adapter is only 10 lines: https://github.com/tafia/quick-xml/pull/508/files#diff-7d5c45da93a71cc9d454396255360338e2476d783dcd17e9b57a09c4b36aa182R368-R379
struct ToFmtWrite<T>(pub T);
impl<T> std::fmt::Write for ToFmtWrite<T>
where
T: std::io::Write,
{
fn write_str(&mut self, s: &str) -> std::fmt::Result {
self.0.write_all(s.as_bytes()).map_err(|_| std::fmt::Error)
}
}
Not sure if this difference is relevant? In case of writing, there is no need for detection.
When you write, you himself command which encoding to use (= which bytes to write when you write text string). When you read, XML could contain used encoding inside XML itself (so you need to read XML in some encoding to determine which encoding is used...)
It was the answer to why serde_json shouldn't be seen as the truth of last resort, and why the missing issue in their bug tracker means nothing.
Nonetheless it would be a good idea to file an issue with serde
Serde part writes only XML fragments which is not a documents, so writing they to file / network directly could be error-prone for reader (is reader expects standalone XML). This is the case where the explicit is better, than implicit.
Right, that's a good point.
The simplest adapter is only 10 lines:
Ah right, thanks. I was thinking the other way around.
Serde part writes only XML fragments which is not a documents
BTW, is this a new behavior? I'm asking cause my testcase now fails on reading a sample xml when i upgrade with
Error: QuickXml(Custom("missing field `type`"))
This is because you need to name attributes in your structs starting with a @
character. The code under provided link is probably already was changed, because I do not see the field type
in your Node
struct, but I see that field name
is not renamed to @name
.
After reading all the comments above, I still haven't figure how to write XML to std::fs::File
with to_writer
(without holding the complete XML output in memory first) :(
For UTF-8 case you can use:
use quick_xml::se::to_writer;
use serde::Serialize;
struct ToFmtWrite<T>(pub T);
impl<T> std::fmt::Write for ToFmtWrite<T>
where
T: std::io::Write,
{
fn write_str(&mut self, s: &str) -> std::fmt::Result {
self.0.write_all(s.as_bytes()).map_err(|_| std::fmt::Error)
}
}
#[derive(Serialize)]
struct Xml {
#[serde(rename = "@attribute")]
attribute: &'static str,
element: &'static str,
}
let data = Xml {
attribute: "attribute",
element: "element",
};
let mut io = ToFmtWrite(std::io::stdout());
to_writer(&mut io, &data).unwrap();
Is there a reason for this change? It makes it impossible to use this function with a
Vec<u8>
(which does implementstd::io::Write
), while it is still possible in the current released version.