voltrondata-labs / arrowbench

R package for benchmarking
Other
13 stars 9 forks source link

[ENG-3630] Use `waldo::compare()` for validating TPC-H queries #82

Closed alistaire47 closed 2 years ago

alistaire47 commented 2 years ago

Changes from all.equal() to waldo::compare() in TPC-H benchmark for nicer diffs and adjusts printing and control flow (compare() returns an empty vector instead of TRUE when there are no differences) accordingly. Leaves programmatic usages of all.equal() (those not printed) alone.

Minor changes:

jonkeane commented 2 years ago

This is really great! Did you have a chance to try also swapping out https://github.com/ursacomputing/arrowbench/blob/7882f37c39bf7a9918f537257fd0db41ba121c88/R/bm-tpc-h.R#L129 as well? it looks like it worked for line 106 up above.

alistaire47 commented 2 years ago

This is really great! Did you have a chance to try also swapping out

https://github.com/ursacomputing/arrowbench/blob/7882f37c39bf7a9918f537257fd0db41ba121c88/R/bm-tpc-h.R#L129 as well? it looks like it worked for line 106 up above.

I did, but it gets grepped below

https://github.com/ursacomputing/arrowbench/blob/7882f37c39bf7a9918f537257fd0db41ba121c88/R/bm-tpc-h.R#L133-L139

and waldo changes those messages, specifically to

> dput(waldo::compare(Sys.Date(), as.character(Sys.Date()), ignore_attr = FALSE, tolerance = 0.01))
structure("`old` is \033[32man S3 object of class <Date>, a double vector\033[39m\n`new` is \033[32ma character vector\033[39m ('2022-04-18')", class = "waldo_compare", max_diffs = 10)

Since that one doesn't get printed (the variable gets overwritten on 139), I figured there was less benefit, and a little risk that a more complicated regex might catch something inadvertently. If you're ok with a more complicated regex we can swap it, though.

jonkeane commented 2 years ago

Ah, yup yup you're right on that! Thanks

jonkeane commented 2 years ago

/document