ropensci / visdat

Preliminary Exploratory Visualisation of Data
https://docs.ropensci.org/visdat/
Other
450 stars 47 forks source link

Enabling visdat to ignore minor differences #96

Open BobMuenchen opened 6 years ago

BobMuenchen commented 6 years ago

It would be helpful if vis_compare could have an argument to tell it to ignore minor differences such as a shifted column location, storage type, or a different sort order. The functions tidyverse::setequal or compare::compare can detect those changes and tell you the data frames are otherwise the same:

Ways to compare data frames

names(mtcars) mtcars2 <- mtcars

Change cyl to character vector

mtcars2$cyl <- as.character(mtcars2$cyl)

Visualize the column differences (This will show any differences)

vis_compare(mtcars, mtcars2)

Change variable order

mtcars2 <- select(mtcars2, wt:carb, mpg:drat)

vis_compare doesn't know how to ignore this type of difference

vis_compare(mtcars, mtcars2)

Change row order by sorting

library("tidyverse") mydata100b <- arrange(mtcars2, mpg)

Three ways to compare

identical(mtcars, mtcars2)

all.equal(mtcars, mtcars2)

setequal figures out what happened, but doesn't report different sort order.

library("tidyverse") setequal(mtcars, mtcars2)

The compare package reports even the sort difference

install.packages("compare") library("compare")

This tests one column at a time

compare(mtcars, mtcars2)

This figures out all changes:

compare(mtcars, mtcars2, allowAll = TRUE)

visdat can't ignore those relatively minor changes

library("visdat") vis_compare(mtcars, mtcars2)

njtierney commented 5 years ago

Thank you for taking the time to write this!

The compare package looks like a great approach to this, hopefully this can make it into the next release of visdat.