simd-lite / simd-json

Rust port of simdjson
Apache License 2.0
1.14k stars 86 forks source link

Improve testing #57

Closed Licenser closed 1 year ago

Licenser commented 5 years ago

We still have a rather low test coverage on parts of the code (my fault -.-) now that we can measure it we should aim to improve it. 90%+ sounds like a good target

sunnygleason commented 5 years ago

Yay! You found my not-so-subtle reason for adding coverage 😂

Licenser commented 5 years ago

To be fair the coverage of the critical path is quite good, >90% even with the buggy coverage. The tests missing are more for 'boring/unimporant' functions and will just exist to push the number and make it look nicer.

sunnygleason commented 5 years ago

👍 , my 100% coverage club membership expired a couple years ago... 😂

Licenser commented 5 years ago

I have lots of opinions on the topic 😂

Licenser commented 5 years ago

Mostly the serde bits are left

sunnygleason commented 4 years ago

getting there! the big things that jump out at me here are:

Wondering if there's a way to get the macros out of the function definition in stage2.rs ...

Licenser commented 4 years ago

Ja serde is the big thing left.

dvermd commented 2 years ago

I looked into some coverage and made some small steps as I'm real new to serde. Now I found what may be a bug with this main which panics on the last unwrap

use serde::{Deserialize, Serialize};
use serde_json::Result;
use std::collections::HashMap;

#[derive(Serialize, Deserialize, PartialEq, Eq, Hash, Debug)]
struct Bar1(u8);

fn main() -> Result<()> {
    let b = Bar1(3);
    println!("serde: Bar1:{}", serde_json::to_string(&b)?);
    println!("simd : Bar1:{}", simd_json::to_string(&b).unwrap());

    let mut h = HashMap::new();
    h.insert(Bar1(3), 3i8);
    let h_serde_str = serde_json::to_string(&h)?;
    println!("serde HashMap:{}", &h_serde_str);
    let mut h_simd_str = simd_json::to_string(&h).unwrap();
    println!("simd  HashMap:{}", &h_simd_str);

    let h_serde_de = serde_json::from_str::<HashMap<Bar1, i8>>(&h_serde_str)?;
    println!("h_de{:?}", h_serde_de);

    let h_simd_de = simd_json::from_str::<HashMap<Bar1, i8>>(&mut h_simd_str).unwrap();
    println!("h_de{:?}", h_simd_de);

    Ok(())
}

The output is

serde: Bar1:3
simd : Bar1:3
serde HashMap:{"3":3}
simd  HashMap:{"3":3}
h_de{Bar1(3): 3}
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { index: 0, character: '?', error: ExpectedUnsigned }', src/main.rs:23:79
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[Finished running. Exit status: 101]
dvermd commented 2 years ago

I think the problem is when parsing the key of an JSON object: for example {"3":3} to a HashMap<i8, i8>.

In serde-json, a MapKey struct is used to skip the '"' char if needed base on the destination type to be able to parse the key with its right type and also skip the other '"' when done.

simd-json just calls deserialize on the seed (PhantomData<i8>) which is implemented here by parsing the str to a i8 and it fails with an ExpectedSigned Error because the str still contains the '"'.

I can confirm that because if I add

            self.len -= 1;
            self.de.skip();

before the seed.deserialize(&mut *self.de).map(Some), this test cases passes, though this is not the right fix and break many other tests.

Licenser commented 2 years ago

Sorry for the late reply, this got buried in a bunch of other notification :)

yes that's quite possible, the big difference between serde and simd-json in this regard is that simd-json uses a intermediate format to parse out datatypes already so by the time it hits the serde interop layer simd-json already decided (and validated) that the key of an object is a string.

so the skipping part won't work, we would have to do the string -> i8 parsing after the fact. On the other hand it'll be more correct (i.e. would even allow escaped numbers) but I'm also fine with saying "Due to differences in approach this kind of mapping isn't possible automatically, please implement a custom deserializer for Bar1"

dvermd commented 2 years ago

You're totally right, skipping won't work and string -> i8 should be done after the fact.

I think this would be the right thing to do because when a user adds #[derive(Deserialize)] to its struct or use a struct (like HashMap) that is known to be implementing the Deserialize trait, it expects the struct to be Deserializable.

Implement a custom serializer for a user struct means digging in the internals of simd-json and it then making the string -> i8 conversion anyway.

This kind of changes will make JSON -> Tape -> user type

     #[cfg_attr(not(feature = "no-inline"), inline(always))]
     #[allow(clippy::cast_sign_loss)]
     fn parse_i8(&mut self) -> Result<i8> {
         match unsafe { self.next_() } {
             Node::Static(s) => s
                 .as_i8()
                 .ok_or_else(|| Self::error(ErrorType::ExpectedSigned)),
+            Node::String(s) => s
+                .parse::<i8>()
+                .map_err(|_| Self::error(ErrorType::ExpectedSigned)),
             _ => Err(Self::error(ErrorType::ExpectedSigned)),
         }
     }

Now, there's still the owned/borrowed Value -> user type to address

dvermd commented 2 years ago
     #[cfg_attr(not(feature = "no-inline"), inline(always))]
     #[allow(clippy::cast_sign_loss)]
     fn parse_i8(&mut self) -> Result<i8> {
         match unsafe { self.next_() } {
             Node::Static(s) => s
                 .as_i8()
                 .ok_or_else(|| Self::error(ErrorType::ExpectedSigned)),
+            Node::String(s) => s
+                .parse::<i8>()
+                .map_err(|_| Self::error(ErrorType::ExpectedSigned)),
             _ => Err(Self::error(ErrorType::ExpectedSigned)),
         }
     }

That won't work either because it'll allow to parse {"3": 3} as HashMap<i8, i8> but will also parse {"3": "3"} to this same type which is wrong.

One way to do it might be to create a MapKey type somewhat equivalent to the serde one to allow for string -> i8 parsing only for keys.

Licenser commented 2 years ago

Ja making parse_i8 generically parsing strings comes with some problems, one alternative would be looking at the Map deserializer, perhaps it's possible to hook into that bit?

https://github.com/simd-lite/simd-json/blob/main/src/serde/de.rs#L295

A cutsoim map visitor could work?

dvermd commented 2 years ago

Would you consider adding --ignore-tests argument to the tarpaulin CI ?

Licenser commented 2 years ago

yes that makes perfect sense :+1: I didn't knew that existed :) good call!

dvermd commented 2 years ago

I think there's some dead code in the serde part of this crate. Can you help identify if this is correct ?

  1. In serde/value/owned/se.rs, I didn't manage to trigger any function of MapKeySerializer other than serialize_str. I think this is because, internally, the Object type of a Value is a HashMap<String, Value> and thus, only String keys will be serialized
  2. In serde/value/owned/de.rs, the ValueVisitor::visit_i/u 8/16/32 should be called from the parse_integer from serde_json but it only ever returns a i64, a u64 or a f64 as of https://github.com/serde-rs/json/blob/master/src/de.rs#L380.
  3. Same for the borrowed values
Licenser commented 1 year ago

We have a lot of tests now