Open Rudo2204 opened 3 years ago
Whoever picks this up, consider starting from https://github.com/tafia/quick-xml/pull/511
Has anybody found a workaround for this? I am having the same issue.
You can close this. Don't know when it was fixed but the original example works now with minor edits:
#[derive(Debug, Deserialize, PartialEq)]
struct DivDefinition {
#[serde(rename = "@style")]
style: String,
#[serde(rename = "$value")]
definition: Vec<MyEnum>,
}
#[derive(Debug, Deserialize, PartialEq)]
enum MyEnum {
b(String),
#[serde(rename = "$text")]
String,
i(String),
}
Thoughts on this idea? https://github.com/enricozb/quick-xml/commit/7b4b3f851a50ae9dbb45d54edfdc7c2374ec59d0
Specifically, I'm adding a new special field name $raw
that can only deserialize into a String
, and just writes all events, until the expected end event, into a string.
It lets you do stuff like this:
const xml: &str = r#"
<who-cares>
<foo property="value">
test
<bar><bii/><int>1</int></bar>
test
<baz/>
</foo>
</who-cares>
"#;
#[derive(Deserialize, Debug)]
struct Root {
#[serde(rename = "$raw")]
value: String,
}
let root = quick_xml::de::from_str::<Root>(&xml).unwrap();
println!("parsed: {root:?}");
This prints
parsed: Root { value: "<foo property=\"value\">test<bar><bii></bii><int>1</int></bar>test<baz></baz></foo>" }
One of the problems with this approach is that it doesn't save exactly what was in the XML file. This would be ideal because we could likely avoid any allocations, like serde_json::value::RawValue
, and we would preserve formatting, and not trim spaces.
Another issue is that empty tags <bii/>
get converted to <bii></bii>
as that is how the events come in.
It's possible my initial idea could be fixed up to disable trimming temporarily of the reader during raw_string
use.
Deserialization of RawValue
in serde_json implemented as deserialization of a newtype with a special name:
https://github.com/serde-rs/json/blob/0131ac68212e8094bd14ee618587d731b4f9a68b/src/de.rs#L1711-L1724
The deserializer then returns data from it's own buffer of directly from input string, depending on what type is deserialized (Box<RawValue>
or &RawValue
). We can do the same because we have read_text
, but right now only for borrowing reader. We need to implement #483 in order to implement read_text_into
needed for owned reader.
Got it. I saw that private newtype name, but wasn't sure why it mattered. I see now that the json deserializer looks for this tag. I'll take a stab at this.
Additionally, I'm not sure if we should capture the surrounding tags or not. What should this print:
struct AnyName {
root: RawValue,
}
const xml: &str = "
<root>
<some/><inner/><tags/>
</root>
";
let x: AnyName = from_str(xml)?;
println!("{}", x.value);
Should this print
<root>
<some/><inner/><tags/>
</root>
or
<some/><inner/><tags/>
Hi, I'm trying to track down a way to de-serialize unknown/arbitrary data under a specific tag and found my way here. Is this currently possible in any form?
I have something like this:
<root>
<someTag> <!-- I am only aware of this tag -->
<arbitraryTag1>
<arbitraryTag2>...stuff...</arbitraryTag2>
<anotherArbitraryTag>foo</anotherArbitraryTag>
</arbitraryTag1>
</someTag>
</root>
I simply need everything under someTag
as a HashMap<String,String>
ideally.
If ...stuff
and foo
would contain only textual data, CDATA sections, comments (would be skipped) and processing instructions (also skipped), then I think it should be possible today. If them can contain markup (i.e. nested tags), then you cannot read them to String
.
@Mingun Thanks for the quick reply! I updated my example, it was missing some data.
Basically, under someTag
, there is a nested structure starting with arbitraryTag1
, but always key-value tags from there. I'd like to capture the name of arbitraryTag1
in some way, and HashMap<String, String>
for the key-values.
So in your example you expect HashMap
with
arbitraryTag1
value:
<arbitraryTag2>...stuff...</arbitraryTag2>
<anotherArbitraryTag>foo</anotherArbitraryTag>
? Or you need something like
// type of `someTag` field
struct SomeTagType {
// filled with "arbitraryTag1"
name: String,
// filled with
// - ("arbitraryTag2", "...stuff...")
// - ("anotherArbitraryTag", "foo")
// - ...
fields: HashMap<String, String>,
}
?
Both are impossible right now. The first because we cannot capture markup to the String
, the second because we (probably) cannot capture tag name as a value (there a separate issue for that -- #778).
@Mingun thanks, the 2nd example is what I'm after.
Can you think of any workarounds?
@Mingun Apologies for the "bump", I'm trying to determine where this stands exactly. #778 mentions something works, but I can't find it.
Ideally, I'm after the ability to capture arbitrary nested XML, similar to what a HashMap<String, serde_json::Value>
can achieve with JSON (in fact, I need to turn them into JSON after)
I'm not 100% clear if this is the correct ticket, #778, or something else.
Thanks again!
In #383 @alex-semov in the initial post gave a code that looks like what you need. Try experimenting with it. If you don't have to extract the attributes from <arbitraryTag1>
, then it looks like it works.
In #383 @alex-semov in the initial post gave a code that looks like what you need. Try experimenting with it. If you don't have to extract the attributes from
<arbitraryTag1>
, then it looks like it works.
Unfortunately we need to extract/convert arbitrary XML into a JSON representation in our case. Something like:
<xml>
<foo><bar>123</bar></foo>
<foobar someattr="thing"/>
<bazfoo anotherattr="stuff">bazzle</bazfoo>
</xml>
to
{
"foo": {
"bar": 123
},
"foobar": {
"@someattr": "thing"
},
"bazfoo": {
"@anotherattr": "stuff",
"@value": "bazzle"
}
}
JSON structure is just an example, we just need a way to do it in some way.
I'm trying to deserialize some dictionary defitnitions and came across this one which contains mixed multiple tags with normal string (html text formatting).
I looked around in serde-xml-rs tests and tried this solution which seems to be close but it doesn't quite work
The error I'm getting is:
I can make it work for now by not using
MyEnum
and just usedefinition: Vec<String>
, but then I wouldn't know which text is bold and which is italic. How can I properly deserialize this?