Open zhizhongpu opened 5 months ago
@zhizhongpu, thank you for reporting this! I have confirmed it below. Would you like to propose a solution in a pull request?
library(tidytab)
library(tidyverse)
enframe(1:1010) %>% tab(value)
#> value │ Freq. Percent Cum.
#> ────────────┼───────────────────────────────────
#> 1 │ 1 0.1 0.1
#> 2 │ 1 0.1 0.2
#> 3 │ 1 0.1 0.3
#> (redacted)
#> 997 │ 1 0.1 99.7
#> 998 │ 1 0.1 99.8
#> 999 │ 1 0.1 99.9
#> 1000 │ 1 0.1 100.0
#> 1001 │ 1 0.1 100.1
#> 1002 │ 1 0.1 100.2
#> 1003 │ 1 0.1 100.3
#> 1004 │ 1 0.1 100.4
#> 1005 │ 1 0.1 100.5
#> 1006 │ 1 0.1 100.6
#> 1007 │ 1 0.1 100.7
#> 1008 │ 1 0.1 100.8
#> 1009 │ 1 0.1 100.9
#> 1010 │ 1 0.1 100.0
#> ────────────┼───────────────────────────────────
#> Total │ 1010 100.0
Created on 2024-04-16 with reprex v2.1.0
For reference, this is how Stata outputs the same:
. tab value
value | Freq. Percent Cum.
------------+-----------------------------------
1 | 1 0.10 0.10
2 | 1 0.10 0.20
3 | 1 0.10 0.30
* (redacted)
997 | 1 0.10 98.71
998 | 1 0.10 98.81
999 | 1 0.10 98.91
1000 | 1 0.10 99.01
1001 | 1 0.10 99.11
1002 | 1 0.10 99.21
1003 | 1 0.10 99.31
1004 | 1 0.10 99.41
1005 | 1 0.10 99.50
1006 | 1 0.10 99.60
1007 | 1 0.10 99.70
1008 | 1 0.10 99.80
1009 | 1 0.10 99.90
1010 | 1 0.10 100.00
------------+-----------------------------------
Total | 1,010 100.00
My preference would be to keep single decimals for the Percent
and Cum.
columns.
I think the problem comes from the function ftab in tab.R
# tab.R line 139
x <- dplyr::mutate(x, Percent = formatC(.data[["Freq."]] / sum(.data[["Freq."]]) * 100, digits = 1L,, format = "f"), Cum. = formatC(cumsum(.data[["Percent"]]), digits = 1L, format = "f"))
Created on 2024-04-17 with reprex v2.1.0
The problem comes from the digits = 1L because it rounds the percentages computed. This becomes an issue when the table in input is large as the percentages computed become either imprecise or close to zero. I would suggest to remove all the digits = 1L at this line and to round the final numbers at the end of the computation.
Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.
──────────────────┼─────────────────────────────────── ancud │ 3 0.2 0.2 antofagasta │ 32 1.8 2.0 arauco │ 10 0.6 2.6 arica │ 14 0.8 3.4 aysen │ 2 0.1 3.5 buin │ 1 0.1 3.6 calbuco │ 2 0.1 3.7 carahue │ 3 0.2 3.9 casablanca │ 1 0.1 4.0 castro │ 1 0.1 4.1 cerro navia │ 2 0.1 4.2 chimbarongo │ 3 0.2 4.4 cobquecura │ 1 0.1 4.5 colbun │ 1 0.1 4.6 colchane │ 1 0.1 4.7 collipulli │ 9 0.5 5.2 concon │ 1 0.1 5.3 coyhaique │ 9 0.5 5.8 cunco │ 5 0.3 6.1 curacautin │ 1 0.1 6.2 curanilahue │ 4 0.2 6.4 curarrehue │ 1 0.1 6.5 dalcahue │ 2 0.1 6.6 el bosque │ 9 0.5 7.1 el quisco │ 1 0.1 7.2 ercilla │ 4 0.2 7.4 florida │ 15 0.8 8.2 futrono │ 4 0.2 8.4 galvarino │ 9 0.5 8.9 higgins │ 23 1.3 10.2 huasco │ 1 0.1 10.3 huechuraba │ 4 0.2 10.5 la ligua │ 1 0.1 10.6 la union │ 2 0.1 10.7 las condes │ 2 0.1 10.8 lautaro │ 10 0.6 11.4 lebu │ 6 0.3 11.7 limache │ 3 0.2 11.9 llanquihue │ 1 0.1 12.0 lo espejo │ 1 0.1 12.1 loncoche │ 1 0.1 12.2 los andes │ 1 0.1 12.3 los lagos │ 5 0.3 12.6 los vilos │ 1 0.1 12.7 lota │ 2 0.1 12.8 macul │ 2 0.1 12.9 marchihue │ 1 0.1 13.0 maule │ 3 0.2 13.2 mejillones │ 1 0.1 13.3 melipilla │ 1 0.1 13.4 mostazal │ 1 0.1 13.5 nancagua │ 1 0.1 13.6 nueva imperial │ 1 0.1 13.7 nunoa │ 5 0.3 14.0 paillaco │ 1 0.1 14.1 panguipulli │ 1 0.1 14.2 penalolen │ 1 0.1 14.3 petorca │ 3 0.2 14.5 pichidegua │ 1 0.1 14.6 pichilemu │ 3 0.2 14.8 pozo almonte │ 2 0.1 14.9 providencia │ 7 0.4 15.3 pudahuel │ 5 0.3 15.6 puente alto │ 7 0.4 16.0 puerto octay │ 1 0.1 16.1 puerto varas │ 1 0.1 16.2 purranque │ 2 0.1 16.3 quellon │ 1 0.1 16.4 quilicura │ 5 0.3 16.7 renca │ 1 0.1 16.8 rio bueno │ 3 0.2 17.0 rio negro │ 6 0.3 17.3 san antonio │ 3 0.2 17.5 san carlos │ 2 0.1 17.6 san gregorio │ 1 0.1 17.7 san javier │ 1 0.1 17.8 san joaquin │ 2 0.1 17.9 santa cruz │ 1 0.1 18.0 santiago │ 1488 82.0 100.0 sierra gorda │ 1 0.1 100.1 taltal │ 1 0.1 100.2 teodoro schmidt │ 1 0.1 100.3 tiltil │ 2 0.1 100.4 tirua │ 1 0.1 100.5 tocopilla │ 2 0.1 100.6 tome │ 1 0.1 100.7 tucapel │ 3 0.2 100.9 valdivia │ 6 0.3 101.2 vallenar │ 1 0.1 101.3 victoria │ 2 0.1 101.4 vilcun │ 1 0.1 101.5 villarrica │ 3 0.2 101.7 vitacura │ 4 0.2 101.9 yumbel │ 1 0.1 102.0 yungay │ 4 0.2 100.0 ──────────────────┼─────────────────────────────────── Total │ 1814 100.0
Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex. i'm using a proprietary dataset, but i think the general problem holds whenever there're a large number of categories with relatively small percentage of the total For more advice on how to write a great issue, see https://code-review.tidyverse.org/issues/.
Brief description of the problem