ropensci / git2rdata

An R package for storing and retrieving data.frames in git repositories.
https://ropensci.github.io/git2rdata/
GNU General Public License v3.0
99 stars 12 forks source link

[Probably a flawed observation:] upgrade_data() in version 0.2.1 writes version 0.2.0 + states 'already up to date' #59

Closed florisvdh closed 4 years ago

florisvdh commented 4 years ago

Having an older file saved as version 0.1 and having git2rdata 0.2.1 installed, I wanted to upgrade the (yml) file to version 0.2.1 with upgrade_data().

However:

@@ -1,12 +1,12 @@
 ..generic:
-  git2rdata: '0.1'
+  git2rdata: 0.2.0
   optimize: yes
   NA string: NA
   sorting:
   - Druk
   - Vegcode
   hash: 6c5c8ee2dd45485c07c02b33dc139909a6c7d7d6
-  data_hash: d57db8b4c721a5ea777e02342059400afd22716e
+  data_hash: ffedb846392fa85124cac3adc518dd5ef8510d1c

Actively reading and rewriting (read_vc() followed by write_vc()) does upgrade the metadata to version 0.2.1.

Session Info ```r ─ Session info ───────────────────────────────────────────────────────────────────────────────────── setting value version R version 3.6.3 (2020-02-29) os Linux Mint 18.1 system x86_64, linux-gnu ui RStudio language (EN) collate nl_BE.UTF-8 ctype nl_BE.UTF-8 tz Europe/Brussels date 2020-03-09 ─ Packages ───────────────────────────────────────────────────────────────────────────────────────── package * version date lib source assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0) cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.0) cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.3) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0) dplyr * 0.8.4 2020-01-31 [1] CRAN (R 3.6.2) drat 0.1.5 2019-03-28 [1] CRAN (R 3.6.0) fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.2) foreign 0.8-76 2020-03-03 [4] CRAN (R 3.6.3) git2r 0.26.1 2019-06-29 [1] CRAN (R 3.6.0) git2rdata * 0.2.1 2020-03-02 [1] CRAN (R 3.6.3) glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0) googlesheets * 0.3.0.9000 2019-05-08 [1] Github (jennybc/googlesheets@12abb02) magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0) packrat 0.5.0 2018-11-14 [1] CRAN (R 3.6.0) pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.2) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1) purrr 0.3.3 2019-10-18 [1] CRAN (R 3.6.1) R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.2) Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.2) rlang 0.4.5 2020-03-01 [1] CRAN (R 3.6.3) rsconnect 0.8.16 2019-12-13 [1] CRAN (R 3.6.2) rstudioapi 0.11 2020-02-07 [1] CRAN (R 3.6.3) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0) tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0) tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.2) vctrs 0.2.3 2020-02-20 [1] CRAN (R 3.6.3) withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0) yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.2) [1] /home/floris/lib/R/library [2] /usr/local/lib/R/site-library [3] /usr/lib/R/site-library [4] /usr/lib/R/library ```
ThierryO commented 4 years ago

Can you send me the old file (tsv + yml) in private? The version number isn't hard-coded so this shouldn't happen.

florisvdh commented 4 years ago

Something odd seems to have happened, as I cannot reproduce the problem anymore: now upgrade_data() does work correctly. It may be that the RStudio project I'm in has influenced this - it happened the first time only (yet I'm sure git2rdata was at 0.2.1). I ran git2rdata::upgrade_data(path = ".") and I got a list of all files 'already up to date', while version in all yml files effectively was rewritten as 0.2.0.

Now having checked out again the git commit with yml files version 0.1 (in the same RStudio proj), in an attempt to reproduce the problem: git2rdata::upgrade_data(path = ".") now did mention the update of the yml files and it was to 0.2.1. I don't get how this is possible. But has probably nothing to do with git2rdata, then...

ThierryO commented 4 years ago

Maybe you didn't restart the session after updating to git2rdata 0.2.0? I'll close this for now.

florisvdh commented 4 years ago

And now I see it again, a fresh R session in another RStudio proj, a case with existing 0.2.0 files. The files are the ones in this folder, in n2khab package.

> git2rdata::upgrade_data(path = "misc/generate_textdata/rawraw_data")
misc/generate_textdata/rawraw_data/10_compmeetnet_types_milieudrukken already up to date
misc/generate_textdata/rawraw_data/vegwhh already up to date
misc/generate_textdata/rawraw_data/10_compmeetnet_types_milieudrukken.yml 
  "misc/generate_textdata/rawraw_data/10_compmeetnet_types_milieudrukken" 
                            misc/generate_textdata/rawraw_data/vegwhh.yml 
                              "misc/generate_textdata/rawraw_data/vegwhh" 

Versions in files were effectively kept at 0.2.0 (nothing changed indeed), but:

> packageVersion("git2rdata")
[1] ‘0.2.1’

All that I can think of is that git2rdata was still at 0.2.0 the last time I used this RStudio project and this is somehow 'remembered'.

For this case I currently didn't yet get it into a state where it does work. Tried:

Restarting R session...

> git2rdata::upgrade_data(path = "misc/generate_textdata/rawraw_data")
misc/generate_textdata/rawraw_data/10_compmeetnet_types_milieudrukken already up to date
misc/generate_textdata/rawraw_data/vegwhh already up to date
misc/generate_textdata/rawraw_data/10_compmeetnet_types_milieudrukken.yml 
  "misc/generate_textdata/rawraw_data/10_compmeetnet_types_milieudrukken" 
                            misc/generate_textdata/rawraw_data/vegwhh.yml 
                              "misc/generate_textdata/rawraw_data/vegwhh" 
> library(dplyr)
Attaching package: ‘dplyr’
The following object is masked from ‘package:git2rdata’:
    pull
The following objects are masked from ‘package:stats’:
    filter, lag
The following objects are masked from ‘package:base’:
    intersect, setdiff, setequal, union

# The following does work (gives version 0.2.1 in yml):

> read_vc("misc/generate_textdata/rawraw_data/vegwhh") %>% 
    write_vc("misc/generate_textdata/rawraw_data/vegwhh")
       5de758b2a46c15bc9b4ae2cd9e35305a48928c4d        2b6452f3ac66d8525c48f7e0bfafb683d4c07150 
"misc/generate_textdata/rawraw_data/vegwhh.tsv" "misc/generate_textdata/rawraw_data/vegwhh.yml" 

# Then, after having done a hard-reset in git, still nothing changed with:
> upgrade_data(path = "misc/generate_textdata/rawraw_data")
misc/generate_textdata/rawraw_data/10_compmeetnet_types_milieudrukken already up to date
misc/generate_textdata/rawraw_data/vegwhh already up to date
misc/generate_textdata/rawraw_data/10_compmeetnet_types_milieudrukken.yml 
  "misc/generate_textdata/rawraw_data/10_compmeetnet_types_milieudrukken" 
                            misc/generate_textdata/rawraw_data/vegwhh.yml 
                              "misc/generate_textdata/rawraw_data/vegwhh"
ThierryO commented 4 years ago

What you see is by design. To minimise the diffs, we only change the files if the structure of the metadata changed between the old and the current version (breaking changes). There were no such breaking changes since version 0.1.0.9001. So there is no need to upgrade files written with version 0.2.0.

If you insist on updating the version number, you'll need to rewrite the data using write_vc().

florisvdh commented 4 years ago

I think this may be the solution to the riddles in my first two posts: it could be that the git diff (yml: from 0.1 to 0.2.0) already existed before (from a long time ago - hence done with git2rdata 0.2.0). I.e. not starting from a clean working directory in git when doing the upgrade with git2rdata 0.2.1. This, in combination with the fact that 0.2.0 is not upgraded by version 0.2.1 would explain all observations. :roll_eyes:

Sorry for bothering and thanks for the clarifications.