tafia / quick-xml

Rust high performance xml reader and writer
MIT License
1.21k stars 235 forks source link

The `se::to_writer` function now requires `std::fmt::Write` instead of `std::io::Write` #499

Open calops opened 2 years ago

calops commented 2 years ago

Is there a reason for this change? It makes it impossible to use this function with a Vec<u8> (which does implement std::io::Write), while it is still possible in the current released version.

Mingun commented 2 years ago

Yes, the reason is because XML is a text-based format which could use many encodings. You should manually encode the string result to bytes for now. If you want an UTF-8 encoded data in Vec<u8>, it is easily and zero-cost to get it:

let ser = Serializer::new(String::new());
let string: String = data.serialize(ser).unwrap();
let bytes: Vec<u8> = string.into_bytes();
calops commented 2 years ago

That's a fair point. I have a question though:

I create a Writer, which requires a Write argument that implements std::io::Write. But if I want to pass that writer around and append to what it's doing, I can only call to_writer(writer.inner()), which won't work, because to_writer requires std::fmt::Write. Right now I'm having to use an intermediate type that implements thefmtversion and forwards it toio` (or vice versa).

I feel like the actual solution to my use-case would be the ability to call the Writer directly so that it can serialize a random argument I throw at it, without having to go through create_element, which requires defining a root tag (which I don't want in this context).

Mingun commented 2 years ago

You want to mix serde serializer with event-based Writer API? Could you describe your case better? Also, I suppose you use master, right?

calops commented 2 years ago

Basically, I write a bunch of nodes myself through the event-based API. But then the inner part is serialized through serde, so I end up calling to_writer(ToFmtWrite(writer.inner())) in the middle of my events, ToFmtWrite being a dumb newtype with the correct fmt::Write implementation for anything that implements io::Write. I'm using a Vec<u8> as the buffer for the writer (but I'd use a String just as well if it meant I don't have to do this Write trickery).

Being able to replace something like that

writer.write_event(Event::Start(BytesStart::new("foo")))?;
quick_xml::se::to_writer(ToWriteFmt(writer.inner()), &Bar { baz: 42 })?;
writer.write_event(Event::End(BytesEnd::new("foo")))?;

with something like this

writer
    .create_element("foo")
    .write_serialized_content(Bar { baz: 42 })?;

would be the ideal implementation for my use-case.

And yes, I am using master.

Mingun commented 2 years ago

with something like this

writer
    .create_element("foo")
    .write_serialized_content(Bar { baz: 42 })?;

would be the ideal implementation for my use-case.

Yes, I agree that we should have it, but it just not yet implemented. Did you mind to make a PR?

Ideally, we should have a reader, that would be able to deserialize objects using serde and a writer, that should be able to serialize them, but implementing a deserializer part a much more hard task then the serializer one.

calops commented 2 years ago

Yeah I can work on a PR for this, I'll get back to you once it's ready.

zeenix commented 1 year ago

I just tried to upgrade to 0.27 in zbus and was surprised by this change breaking my code. I know it's a different semver release so breaking changes are to be expected but it would have been nice to find any mention of this change and the rationale in either the release notes, or the relevant commit log (4f376b1e).

In any case, isn't it change making to_writer inconsistent with both other similar serde APIs and also with the to_reader, which expects a io::Read?

zeenix commented 1 year ago

In any case, isn't it change making to_writer inconsistent with both other similar serde APIs and also with the to_reader, which expects a io::Read?

Oops, give the incorrect link to the JSON lib. I meant serde_json of course: https://docs.rs/serde_json/latest/serde_json/fn.to_writer.html

zeenix commented 1 year ago

Yes, the reason is because XML is a text-based format

I would think that's true for JSON as well.

which could use many encodings.

Does fmt::Write help there? AFAIK it's UTF-8 specific. In fact the docs say:

This trait only accepts UTF-8–encoded data ... If you only want to accept Unicode and you don’t need flushing, you should implement this trait; otherwise you should implement std::io::Write.

So that seems like a counter-argument to using fmt::Write unless I'm missing something? :thinking:

Mingun commented 1 year ago

I would think that's true for JSON as well.

Yes, and I think this is oversight in the serde_json. It writes UTF-8 only, which is a task for fmt::Write.

Our serde writer doesn't write the XML declaration (<?xml ... ?>) for composability (actually, it writes XML fragments instead of XML documents) and produces only UTF-8 encoded output. This is by intention: if you need to get the result in another encoding, you should encode the result yourself and append a correct XML declaration. Both tasks can be added as helper functions to quick-xml.

Adapter from fmt::Write to io::Write performs infallible conversion, but the opposite is not true. So we decided to use more specific trait to write data.

zeenix commented 1 year ago

Thanks for explaining.

Yes, and I think this is oversight in the serde_json

Are you sure? Is there an issue on serde_json about this? I'm sorry for being skeptical here but this is @dtolnay (the main person behind a lot of Rust's fundamental API, including serde itself) we're talking about.

Our serde writer doesn't write the XML declaration (<?xml ... ?>) for composability (actually, it writes XML fragments instead of XML documents) and produces only UTF-8 encoded output. This is by intention: if you need to get the result in another encoding, you should encode the result yourself and append a correct XML declaration. Both tasks can be added as helper functions to quick-xml.

Yeah but AFAIK Write is an abstraction for producing the output and should be independent of the format. E.g if I want to write to a file or socket, I'd use the io::Write implementing types for that (just like I'd use io::Read for reading from them). If only UTF-8 is supported, then I don't see any problem. If other encodings are planned or expected in the future, then I'd suggest making it configurable via some parameters to to_writer and from_reader. I actually implemented such an API (see the ctxt param).

Also what about the internal inconsistency with from_reader?

Mingun commented 1 year ago

Is there an issue on serde_json about this?

I don't known. While @dtolnay can be great in other projects, serde project is not in him focus nowadays and it is effectively frozen as you can see from numerous problems in serde bugtracker those does not get any attention from maintainers.

Yeah but AFAIK Write is an abstraction for producing the output and should be independent of the format.

Yes, it is, but in our case, it is more convenient for us to work internally with a type that protects us from the accidental use of incorrect encoding when writing XML parts (or worse, the use of different encodings when writing different parts). That type is everything that impl fmt::Write. Since there is no reason to hide this implementation from an external user, it is part of the public API. The presence of an additional API with io::Write is not prohibited and I will be happy to accept PR, which will add it. I am unlikely to work in this direction, since I do not need to write XML.

Also what about the internal inconsistency with from_reader?

Difference from reader side in two points:

dtolnay commented 1 year ago

I think io::Write is the better API for to_writer. Pretty near 100% of users of a to_writer API are going to be using it with types that implement io::Write and do not implement fmt::Write. Things like File, BufWriter, BytesMut, Stdout, etc. Forcing near 100% of callers to use a fmt/io adapter would be a worse API regardless of the adapter being zero cost.

zeenix commented 1 year ago

Is there an issue on serde_json about this?

I don't known. While @dtolnay can be great in other projects, serde project is not in him focus nowadays and it is effectively frozen as you can see from numerous problems in serde bugtracker those does not get any attention from maintainers.

Yeah but both these traits existed when serde was developed so I'm not sure the current state of serde projects is very relevant here.

The presence of an additional API with io::Write is not prohibited and I will be happy to accept PR, which will add it.

If this API could be zero-cost, I wouldn't mind at all and can look into contributing it. However, looking at the implementations in std, I don't see how that would work w/o some external crate or new fmt::Write impls. :thinking:

I am unlikely to work in this direction, since I do not need to write XML.

That's fair and hope you can agree that it'd be nice not to ditch functionality that could be essential for users.

XML encoding can be detected during reading

Not sure if this difference is relevant? In case of writing, there is no need for detection.

Mingun commented 1 year ago

Forcing near 100% of callers to use a fmt/io adapter would be a worse API regardless of the adapter being zero cost.

That adapter could represent essential entity of XML world -- an XML document, which is a container for XML declaration, DTD, used namespaces. Serde part writes only XML fragments which is not a documents, so writing they to file / network directly could be error-prone for reader (if reader expects standalone XML). This is the case where the explicit is better, than implicit.

Yeah but both these traits existed when serde was developed so I'm not sure the current state of serde projects is very relevant here.

It was the answer to why serde_json shouldn't be seen as the truth of last resort, and why the missing issue in their bug tracker means nothing.

I don't see how that would work w/o some external crate or new fmt::Write impls.

The simplest adapter is only 10 lines: https://github.com/tafia/quick-xml/pull/508/files#diff-7d5c45da93a71cc9d454396255360338e2476d783dcd17e9b57a09c4b36aa182R368-R379

struct ToFmtWrite<T>(pub T);

impl<T> std::fmt::Write for ToFmtWrite<T>
where
    T: std::io::Write,
{
    fn write_str(&mut self, s: &str) -> std::fmt::Result {
        self.0.write_all(s.as_bytes()).map_err(|_| std::fmt::Error)
    }
}

Not sure if this difference is relevant? In case of writing, there is no need for detection.

When you write, you himself command which encoding to use (= which bytes to write when you write text string). When you read, XML could contain used encoding inside XML itself (so you need to read XML in some encoding to determine which encoding is used...)

dralley commented 1 year ago

It was the answer to why serde_json shouldn't be seen as the truth of last resort, and why the missing issue in their bug tracker means nothing.

Nonetheless it would be a good idea to file an issue with serde

zeenix commented 1 year ago

Serde part writes only XML fragments which is not a documents, so writing they to file / network directly could be error-prone for reader (is reader expects standalone XML). This is the case where the explicit is better, than implicit.

Right, that's a good point.

The simplest adapter is only 10 lines:

Ah right, thanks. I was thinking the other way around.

zeenix commented 1 year ago

Serde part writes only XML fragments which is not a documents

BTW, is this a new behavior? I'm asking cause my testcase now fails on reading a sample xml when i upgrade with

Error: QuickXml(Custom("missing field `type`"))
Mingun commented 1 year ago

This is because you need to name attributes in your structs starting with a @ character. The code under provided link is probably already was changed, because I do not see the field type in your Node struct, but I see that field name is not renamed to @name.

iwinux commented 2 months ago

After reading all the comments above, I still haven't figure how to write XML to std::fs::File with to_writer (without holding the complete XML output in memory first) :(

Mingun commented 2 months ago

For UTF-8 case you can use:

use quick_xml::se::to_writer;
use serde::Serialize;

struct ToFmtWrite<T>(pub T);

impl<T> std::fmt::Write for ToFmtWrite<T>
where
    T: std::io::Write,
{
    fn write_str(&mut self, s: &str) -> std::fmt::Result {
        self.0.write_all(s.as_bytes()).map_err(|_| std::fmt::Error)
    }
}
#[derive(Serialize)]
struct Xml {
    #[serde(rename = "@attribute")]
    attribute: &'static str,
    element: &'static str,
}
let data = Xml {
    attribute: "attribute",
    element: "element",
};
let mut io = ToFmtWrite(std::io::stdout());
to_writer(&mut io, &data).unwrap();