Closed gregleleu closed 6 years ago
I will try to reproduce what you report with CRAN readxl then ggmosaic.
As for this:
Also, I've tried using the latest github version of readxl (instead of the CRAN one), but that breaks my code as duplicate names stop being numbered colname_2, colname_3 etc. but by their position in the original file eg. colname_18, colname_34 etc... Is that intentional? Is there a way to chose between both methods?
It is very intentional. I explain somewhat in the blog post announcing v1.1.0 (look at the "Future outlook" section near the end) and also in the NEWS items that will be part of the next release: https://github.com/tidyverse/readxl/blame/master/NEWS.md#L3-L9. That contains a link to the issue where the tidyverse team discusses name repair.
I'm not able to reproduce the issue with the dev. version of readxl and the CRAN version of ggmosaic.
head(readxl::read_excel("nba_teams.xlsx"))
#> # A tibble: 6 x 7
#> id location name nickname slug team_lab logo_lab
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 20901970-53a0-417c-b… Atlanta Atlanta Hawks nba-… atl ATL
#> 2 1c65bbb6-bd10-4ef6-8… Boston Boston Celtics nba-… bos BOS
#> 3 84eb19ca-1e66-416f-9… Brooklyn Brookl… Nets nba-… bk BKN
#> 4 68b04d26-12c3-4e06-8… Charlotte Charlo… Hornets nba-… cha CHA
#> 5 7e670063-ef8d-4356-9… Chicago Chicago Bulls nba-… chi CHI
#> 6 a9abb922-3a47-4d37-9… Cleveland Clevel… Cavalie… nba-… cle CLE
library(ggmosaic)
#> Loading required package: ggplot2
#> Loading required package: productplots
#>
#> Attaching package: 'ggmosaic'
#> The following objects are masked from 'package:productplots':
#>
#> ddecker, hspine, mosaic, prodcalc, spine, vspine
head(readxl::read_excel("nba_teams.xlsx"))
#> # A tibble: 6 x 7
#> id location name nickname slug team_lab logo_lab
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 20901970-53a0-417c-b… Atlanta Atlanta Hawks nba-… atl ATL
#> 2 1c65bbb6-bd10-4ef6-8… Boston Boston Celtics nba-… bos BOS
#> 3 84eb19ca-1e66-416f-9… Brooklyn Brookl… Nets nba-… bk BKN
#> 4 68b04d26-12c3-4e06-8… Charlotte Charlo… Hornets nba-… cha CHA
#> 5 7e670063-ef8d-4356-9… Chicago Chicago Bulls nba-… chi CHI
#> 6 a9abb922-3a47-4d37-9… Cleveland Clevel… Cavalie… nba-… cle CLE
Created on 2018-05-17 by the reprex package (v0.2.0).
Here's a link to the spreadsheet I used: https://www.dropbox.com/s/2h3o3xvq0rqzfnv/nba_teams.xlsx?dl=0
I believe the report is re: the CRAN version of readxl + ggmosaic and, presumably, a spreadsheet with duplicate names (though sheet not provided by OP).
I cannot reproduce the problem with v1.1.0 of readxl. I have a test sheet that requires lots of name repair, so I've used it as my subject.
packageVersion("readxl")
#> [1] '1.1.0'
readxl::read_excel("~/rrr/readxl/tests/testthat/sheets/unnamed-duplicated-columns.xlsx")
#> # A tibble: 2 x 4
#> X__1 var2 X__2 var2__1
#> <dbl> <chr> <dbl> <chr>
#> 1 1 a 1.1 aa
#> 2 2 b 2.1 bb
library(ggmosaic)
#> Loading required package: ggplot2
#> Loading required package: productplots
#>
#> Attaching package: 'ggmosaic'
#> The following objects are masked from 'package:productplots':
#>
#> ddecker, hspine, mosaic, prodcalc, spine, vspine
readxl::read_excel("~/rrr/readxl/tests/testthat/sheets/unnamed-duplicated-columns.xlsx")
#> # A tibble: 2 x 4
#> X__1 var2 X__2 var2__1
#> <dbl> <chr> <dbl> <chr>
#> 1 1 a 1.1 aa
#> 2 2 b 2.1 bb
Created on 2018-05-17 by the reprex package (v0.2.0).
@gregleleu I note that several of the package potentially involved are at dev versions for you.
Specifically, ggmosaic, and ggplot2 catch my eye. Can you reproduce the problem with CRAN versions of one or both of those?
I think it's very likely this is a problem in ggmosiac, which is exporting its own method for as_tibble.list()
.
cc @haleyjeppson
Issue filed in ggmosaic, which is where I believe this problem originates: https://github.com/haleyjeppson/ggmosaic/issues/21
Hi,
I'm having recurrent issues while reading an excel file with duplicate column names after some package are loaded, e.g. ggmosaic (see below). ggmosaic is one package I have identified but it happens with others I haven't been able to pinpoint.
Also, I've tried using the latest github version of readxl (instead of the CRAN one), but that breaks my code as duplicate names stop being numbered colname_2, colname_3 etc. but by their position in the original file eg. colname_18, colname_34 etc... Is that intentional? Is there a way to chose between both methods?
Thanks
sessionInfo before loading ggmosaic (but after reading the excel file)
... and after