Open gwarnes-mdsol opened 7 years ago
Hi @gwarnes-mdsol, could you do me a favor and attach the .csv files, or forward them by email? (my email address is attached to my github profile). Thanks!
Sorry about that. BTW, github doesn't like the extension .csv so I added .txt to make it happy.
Thanks for the files. From the command line, with daff iris.csv.txt iris2.csv.txt
, I'm not seeing the same diff unfortunately, it gives ->
updates everywhere. There was an extra column that looked like a row number, but removing it also wasn't sufficient to replicate. How hard would it be to talk me through how to replicate using R?
Hi Paul, it is pretty simple to replicate in R. I'll try to take some time tomorrow to write brief instructions. In the mean time, installing R would be the first step, :-) http://r-project.org
On Mon, Apr 17, 2017 at 9:46 PM Paul Fitzpatrick notifications@github.com wrote:
Thanks for the files. From the command line, with daff iris.csv.txt iris2.csv.txt, I'm not seeing the same diff unfortunately, it gives -> updates everywhere. There was an extra column that looked like a row number, but removing it also wasn't sufficient to replicate. How hard would it be to talk me through how to replicate using R?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_paulfitz_daff_issues_91-23issuecomment-2D294648437&d=DwMFaQ&c=fi2D4-9xMzmjyjREwHYlAw&r=PUo6rYTmGeWkBJepZc1UHw629PctwMWQF8I3RzhQlL8&m=Y09aeUbp46EnkWxCzc6ZJAo3HC8hn4cOFDekMlehE2c&s=5JvZ9XU6ebKlqbYC2CQ0gEs-6DnsLeI85D8a_B-k_fA&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AVNo-2DrVMkmHDgf8O2EdrjxnHvTYOpItZks5rxBXigaJpZM4M-2DIyB&d=DwMFaQ&c=fi2D4-9xMzmjyjREwHYlAw&r=PUo6rYTmGeWkBJepZc1UHw629PctwMWQF8I3RzhQlL8&m=Y09aeUbp46EnkWxCzc6ZJAo3HC8hn4cOFDekMlehE2c&s=xSzxUauheirwQof3g7MQvardno2VWAwF4U1n6bhA5E4&e= .
Here's the R code to replicate:
install.packages("devtools")
devtools::install_github("edwindj/daff")
library(daff)
iris2 <- iris
levels(iris2$Species)[3] <- "XXX"
df <- diff_data(iris, iris2)
df
render_diff(df)
(Note that the last command render_diff(c)
generates and displays a HTML page that has additional features that it might be worth moving into your codebase.)
And the output on my system:
gwarnes@F5KSH06HF9VN:/tmp$ R
R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> devtools::install_github("edwindj/daff")
Downloading GitHub repo edwindj/daff@master
from URL https://api.github.com/repos/edwindj/daff/zipball/master
Installing daff
trying URL 'https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.3/jsonlite_1.4.tgz'
Content type 'application/x-gzip' length 1077372 bytes (1.0 MB)
==================================================
downloaded 1.0 MB
Installing jsonlite
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ \
--no-save --no-restore --quiet CMD INSTALL \
'/private/var/folders/gc/c3c2p5_d4td159rblkqbp4s1xjhdh_/T/RtmpPi3Rqh/devtoolsc4de4a4865b/jsonlite' \
--library='/Users/gwarnes/Library/R/3.3/library' --install-tests
* installing *binary* package ‘jsonlite’ ...
* DONE (jsonlite)
trying URL 'https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.3/V8_1.4.tgz'
Content type 'application/x-gzip' length 2304654 bytes (2.2 MB)
==================================================
downloaded 2.2 MB
Installing V8
trying URL 'https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.3/Rcpp_0.12.10.tgz'
Content type 'application/x-gzip' length 3020988 bytes (2.9 MB)
==================================================
downloaded 2.9 MB
Installing Rcpp
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ \
--no-save --no-restore --quiet CMD INSTALL \
'/private/var/folders/gc/c3c2p5_d4td159rblkqbp4s1xjhdh_/T/RtmpPi3Rqh/devtoolsc4de5618b221/Rcpp' \
--library='/Users/gwarnes/Library/R/3.3/library' --install-tests
* installing *binary* package ‘Rcpp’ ...
* DONE (Rcpp)
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ \
--no-save --no-restore --quiet CMD INSTALL \
'/private/var/folders/gc/c3c2p5_d4td159rblkqbp4s1xjhdh_/T/RtmpPi3Rqh/devtoolsc4de284a1564/V8' \
--library='/Users/gwarnes/Library/R/3.3/library' --install-tests
* installing *binary* package ‘V8’ ...
* DONE (V8)
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ \
--no-save --no-restore --quiet CMD INSTALL \
'/private/var/folders/gc/c3c2p5_d4td159rblkqbp4s1xjhdh_/T/RtmpPi3Rqh/devtoolsc4de22b005a4/edwindj-daff-a5a97e1' \
--library='/Users/gwarnes/Library/R/3.3/library' --install-tests
* installing *source* package ‘daff’ ...
** R
** inst
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (daff)
> library(daff)
> iris2 <- iris
> levels(iris2$Species)
[1] "setosa" "versicolor" "virginica"
> levels(iris2$Species)[3] <- "XXX"
> df <- diff_data(iris, iris2)
> df
Daff Comparison: ‘iris’ vs. ‘iris2’
First 6 and last 6 patch lines:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
... ... ... ... ... ...
5.7 2.8 4.1 1.3 versicolor
-> 6.3 3.3 6 2.5 virginica->XXX
+++ 5.8 2.7 5.1 1.9 XXX
--- 5.8 2.7 5.1 1.9 virginica
-> 7.1 3 5.9 2.1 virginica->XXX
... ... ... ... ... ...
-> 6.7 3.3 5.7 2.5 virginica->XXX
-> 6.7 3 5.2 2.3 virginica->XXX
-> 6.3 2.5 5 1.9 virginica->XXX
-> 6.5 3 5.2 2 virginica->XXX
-> 6.2 3.4 5.4 2.3 virginica->XXX
-> 5.9 3 5.1 1.8 virginica->XXX
> render_diff(df)
>
Hi @paulfitz,
I think I'm facing the same issue here. The update does not seems to work with the same use case.
Example:
I've tried to play with the --id
flag, but didn't managed to find a way to always make it work
Any idea ? Thanks
FYI, I'm using daff cli 1.3.25 (JS)
I dropped a line in the R code above. I've fixed above, but I'm also posting it here for clarity:
install.packages("devtools")
devtools::install_github("edwindj/daff")
library(daff)
iris2 <- iris
levels(iris2$Species)[3] <- "XXX"
df <- diff_data(iris, iris2)
df
render_diff(df)
Hi @paulfitz, do you think you have time to look at this issue ? Thanks
Simple Example:
ir table:
"Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"
5.8,2.7,5.1,1.9,"virginica"
5.8,2.7,5.1,1.9,"virginica"
ir2 table:
"Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"
5.8,2.7,5.1,1.9,"XXX"
5.8,2.7,5.1,1.9,"XXX"
Comparison:
> diff_data(ir, ir2)
Daff Comparison: 'ir' vs. 'ir2'
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
+++ 5.8 2.7 5.1 1.9 XXX
+++ 5.8 2.7 5.1 1.9 XXX
--- 5.8 2.7 5.1 1.9 virginica
--- 5.8 2.7 5.1 1.9 virginica
I'm getting a similar problem with columns--in a table where some columns have duplicate data of other columns, changing a column header, even if it's a column that does not have duplicated data, shows up as an added and deleted column. Using the bridge example on the demo page, change the Designer column so that it's identical to the Bridge column in both the original and the modified version. Then, in the modified version, change Length to something like Span. The Length/Span column appears as added/removed.
The daff comparison algorithm improperly marks a row with changed data as an added/removed pair.
For instance, comparing the CSV files 'iris.csv' and 'iris2.csv' (via the edwinj/daff R wrapper), I get the following diff:
As you can see, the pair of lines
are shown as an addition + deletion, when they are actually a change in a single column.
For some large files--but not in this file--I see trios or more complex patterns of added/deleted/modified lines where changes in the values in two or more rows are displayed as a mix of modifications to unmatched rows, combined with additions + deletions. Something like: