Closed Licenser closed 1 year ago
Yay! You found my not-so-subtle reason for adding coverage 😂
To be fair the coverage of the critical path is quite good, >90% even with the buggy coverage. The tests missing are more for 'boring/unimporant' functions and will just exist to push the number and make it look nicer.
👍 , my 100% coverage club membership expired a couple years ago... 😂
I have lots of opinions on the topic 😂
Mostly the serde bits are left
getting there! the big things that jump out at me here are:
Wondering if there's a way to get the macros out of the function definition in stage2.rs
...
Ja serde is the big thing left.
I looked into some coverage and made some small steps as I'm real new to serde. Now I found what may be a bug with this main which panics on the last unwrap
use serde::{Deserialize, Serialize};
use serde_json::Result;
use std::collections::HashMap;
#[derive(Serialize, Deserialize, PartialEq, Eq, Hash, Debug)]
struct Bar1(u8);
fn main() -> Result<()> {
let b = Bar1(3);
println!("serde: Bar1:{}", serde_json::to_string(&b)?);
println!("simd : Bar1:{}", simd_json::to_string(&b).unwrap());
let mut h = HashMap::new();
h.insert(Bar1(3), 3i8);
let h_serde_str = serde_json::to_string(&h)?;
println!("serde HashMap:{}", &h_serde_str);
let mut h_simd_str = simd_json::to_string(&h).unwrap();
println!("simd HashMap:{}", &h_simd_str);
let h_serde_de = serde_json::from_str::<HashMap<Bar1, i8>>(&h_serde_str)?;
println!("h_de{:?}", h_serde_de);
let h_simd_de = simd_json::from_str::<HashMap<Bar1, i8>>(&mut h_simd_str).unwrap();
println!("h_de{:?}", h_simd_de);
Ok(())
}
The output is
serde: Bar1:3
simd : Bar1:3
serde HashMap:{"3":3}
simd HashMap:{"3":3}
h_de{Bar1(3): 3}
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { index: 0, character: '?', error: ExpectedUnsigned }', src/main.rs:23:79
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[Finished running. Exit status: 101]
I think the problem is when parsing the key of an JSON object: for example {"3":3}
to a HashMap<i8, i8>
.
In serde-json, a MapKey struct is used to skip the '"'
char if needed base on the destination type to be able to parse the key with its right type and also skip the other '"'
when done.
simd-json just calls deserialize on the seed (PhantomData<i8>
) which is implemented here by parsing the str
to a i8
and it fails with an ExpectedSigned
Error because the str
still contains the '"'
.
I can confirm that because if I add
self.len -= 1;
self.de.skip();
before the seed.deserialize(&mut *self.de).map(Some)
, this test cases passes, though this is not the right fix and break many other tests.
Sorry for the late reply, this got buried in a bunch of other notification :)
yes that's quite possible, the big difference between serde
and simd-json
in this regard is that simd-json
uses a intermediate format to parse out datatypes already so by the time it hits the serde
interop layer simd-json
already decided (and validated) that the key of an object is a string.
so the skipping part won't work, we would have to do the string -> i8 parsing after the fact. On the other hand it'll be more correct (i.e. would even allow escaped numbers) but I'm also fine with saying "Due to differences in approach this kind of mapping isn't possible automatically, please implement a custom deserializer for Bar1
"
You're totally right, skipping won't work and string -> i8 should be done after the fact.
I think this would be the right thing to do because when a user adds #[derive(Deserialize)]
to its struct or use a struct (like HashMap) that is known to be implementing the Deserialize
trait, it expects the struct to be Deserializable.
Implement a custom serializer for a user struct means digging in the internals of simd-json and it then making the string -> i8 conversion anyway.
This kind of changes will make JSON -> Tape -> user type
#[cfg_attr(not(feature = "no-inline"), inline(always))]
#[allow(clippy::cast_sign_loss)]
fn parse_i8(&mut self) -> Result<i8> {
match unsafe { self.next_() } {
Node::Static(s) => s
.as_i8()
.ok_or_else(|| Self::error(ErrorType::ExpectedSigned)),
+ Node::String(s) => s
+ .parse::<i8>()
+ .map_err(|_| Self::error(ErrorType::ExpectedSigned)),
_ => Err(Self::error(ErrorType::ExpectedSigned)),
}
}
Now, there's still the owned/borrowed Value
-> user type to address
#[cfg_attr(not(feature = "no-inline"), inline(always))] #[allow(clippy::cast_sign_loss)] fn parse_i8(&mut self) -> Result<i8> { match unsafe { self.next_() } { Node::Static(s) => s .as_i8() .ok_or_else(|| Self::error(ErrorType::ExpectedSigned)), + Node::String(s) => s + .parse::<i8>() + .map_err(|_| Self::error(ErrorType::ExpectedSigned)), _ => Err(Self::error(ErrorType::ExpectedSigned)), } }
That won't work either because it'll allow to parse {"3": 3}
as HashMap<i8, i8>
but will also parse {"3": "3"}
to this same type which is wrong.
One way to do it might be to create a MapKey type somewhat equivalent to the serde one to allow for string -> i8 parsing only for keys.
Ja making parse_i8 generically parsing strings comes with some problems, one alternative would be looking at the Map deserializer, perhaps it's possible to hook into that bit?
https://github.com/simd-lite/simd-json/blob/main/src/serde/de.rs#L295
A cutsoim map visitor could work?
Would you consider adding --ignore-tests
argument to the tarpaulin CI ?
yes that makes perfect sense :+1: I didn't knew that existed :) good call!
I think there's some dead code in the serde part of this crate. Can you help identify if this is correct ?
serde/value/owned/se.rs
, I didn't manage to trigger any function of MapKeySerializer
other than serialize_str
. I think this is because, internally, the Object type of a Value is a HashMap<String, Value>
and thus, only String keys will be serializedserde/value/owned/de.rs
, the ValueVisitor::visit_i/u 8/16/32
should be called from the parse_integer
from serde_json but it only ever returns a i64, a u64 or a f64 as of https://github.com/serde-rs/json/blob/master/src/de.rs#L380.We have a lot of tests now
We still have a rather low test coverage on parts of the code (my fault -.-) now that we can measure it we should aim to improve it. 90%+ sounds like a good target