Closed rctatman closed 6 years ago
I've run into a strange bug where, when specifying tolerance for dedup(), the number of rows returned is greater than the number of rows in the original dataset:
dim(iris) # 150 rows dim(iris %>% dedup()) # 149 rows dim(iris %>% dedup(tolerance = 0)) # 11067 rows dim(iris %>% dedup(tolerance = 0.2)) #9156 rows dim(iris %>% dedup(tolerance = 0.4)) # 4627 rows dim(iris %>% dedup(tolerance = 0.6)) # 2640 rows dim(iris %>% dedup(tolerance = 0.8)) # 431 rows dim(iris %>% dedup(tolerance = 1)) # 150 rows
These additional rows are exact duplicates & can be removed with distinct(), but it seems to be unintended behavior.
👋 @rctatman - sorry about the delay, was on vacation, then email notification sank down.
can you reinstall and try again?
Looks like it's fixed in version 0.1.3.9321! :+1:
I've run into a strange bug where, when specifying tolerance for dedup(), the number of rows returned is greater than the number of rows in the original dataset:
These additional rows are exact duplicates & can be removed with distinct(), but it seems to be unintended behavior.