Open dominicroye opened 5 years ago
This appears to be because the month names are regular expressions; so we also need to store canonical names for output.
This is surely a Windows only issue. Probably some regex bug indeed.
@dominicroye could you please provide the output of the following (with local replaced by your French and Spanish locale names)?
Sys.setlocale("LC_TIME", "es_ES.utf8")
format <- "%a@%A@%b@%B@%p@"
enc2utf8(unique(format(lubridate:::.date_template, format = format)))
## [1] "jue@jueves@ene@enero@@" "lun@lunes@feb@febrero@@" "mar@martes@mar@marzo@@"
## [4] "dom@domingo@abr@abril@@" "vie@viernes@may@mayo@@" "mar@martes@jun@junio@@"
## [7] "vie@viernes@jul@julio@@" "mié@miércoles@ago@agosto@@" "mar@martes@sep@septiembre@@"
## [10] "vie@viernes@oct@octubre@@" "mar@martes@nov@noviembre@@" "sáb@sábado@dic@diciembre@@"
Also the value of
str(.get_locale_regs("...your_locales..."))
es_ES.utf8 doesn't exist in Windows.
Here is my output from your code:
SPANISH
> Sys.setlocale("LC_TIME", "Spanish_Spain.1252")
> format <- "%a@%A@%b@%B@%p@"
> enc2utf8(unique(format(lubridate:::.date_template, format = format)))
[1] "ju.@jueves@ene.@enero@@" "lu.@lunes@feb.@febrero@@" "ma.@martes@mar.@marzo@@"
[4] "do.@domingo@abr.@abril@@" "vi.@viernes@may.@mayo@@" "ma.@martes@jun.@junio@@"
[7] "vi.@viernes@jul.@julio@@" "mi.@miércoles@ago.@agosto@@" "ma.@martes@sep.@septiembre@@"
[10] "vi.@viernes@oct.@octubre@@" "ma.@martes@nov.@noviembre@@" "sá.@sábado@dic.@diciembre@@"
> str(lubridate:::.get_locale_regs("Spanish_Spain.1252"))
List of 6
$ alpha_flex : Named chr [1:6] "((?<b_b>ene\\.|feb\\.|mar\\.|abr\\.|may\\.|jun\\.|jul\\.|ago\\.|sep\\.|oct\\.|nov\\.|dic\\.)|(?<B_b>enero|febre"| __truncated__ "(?<B_B>enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)(?![[:alpha:]])" "((?<a_a>ju\\.|lu\\.|ma\\.|do\\.|vi\\.|mi\\.|sá\\.)|(?<A_a>jueves|lunes|martes|domingo|viernes|miércoles|sábado)"| __truncated__ "(?<A_A>jueves|lunes|martes|domingo|viernes|miércoles|sábado)(?![[:alpha:]])" ...
..- attr(*, "names")= chr [1:6] "b" "B" "a" "A" ...
$ num_flex : Named chr [1:24] "(?<d>[012]?[1-9]|3[01]|[12]0)(?!\\d)" "(?<q>[0]?[1-4])(?!\\d)" "(?<H>2[0-4]|[01]?\\d)(?!\\d)" "(?<H>2[0-4]|[01]?\\d)(?!\\d)" ...
..- attr(*, "names")= chr [1:24] "d" "q" "H" "h" ...
$ alpha_exact: Named chr [1:6] "((?<b_b_e>ene\\.|feb\\.|mar\\.|abr\\.|may\\.|jun\\.|jul\\.|ago\\.|sep\\.|oct\\.|nov\\.|dic\\.)|(?<B_b_e>enero|f"| __truncated__ "(?<B_B_e>enero|febrero|marzo|abril|mayo|junio|julio|agosto|septiembre|octubre|noviembre|diciembre)(?![[:alpha:]])" "((?<a_a_e>ju\\.|lu\\.|ma\\.|do\\.|vi\\.|mi\\.|sá\\.)|(?<A_a_e>jueves|lunes|martes|domingo|viernes|miércoles|sáb"| __truncated__ "(?<A_A_e>jueves|lunes|martes|domingo|viernes|miércoles|sábado)(?![[:alpha:]])" ...
..- attr(*, "names")= chr [1:6] "b" "B" "a" "A" ...
$ num_exact : Named chr [1:24] "(?<d_e>[012][1-9]|3[01]|[12]0)" "(?<q_e>[0][1-4])" "(?<H_e>2[0-4]|[01]\\d)" "(?<H_e>2[0-4]|[01]\\d)" ...
..- attr(*, "names")= chr [1:24] "d" "q" "H" "h" ...
$ wday_names :List of 2
..$ abr : chr [1:7] "do\\." "lu\\." "ma\\." "mi\\." ...
..$ full: chr [1:7] "domingo" "lunes" "martes" "miércoles" ...
$ month_names:List of 2
..$ abr : chr [1:12] "ene\\." "feb\\." "mar\\." "abr\\." ...
..$ full: chr [1:12] "enero" "febrero" "marzo" "abril" ...
FRENCH
> Sys.setlocale("LC_TIME", "French_France.1252")
> format <- "%a@%A@%b@%B@%p@"
> enc2utf8(unique(format(lubridate:::.date_template, format = format)))
[1] "jeu.@jeudi@janv.@janvier@@" "lun.@lundi@févr.@février@@" "mar.@mardi@mars@mars@@"
[4] "dim.@dimanche@avr.@avril@@" "ven.@vendredi@mai@mai@@" "mar.@mardi@juin@juin@@"
[7] "ven.@vendredi@juil.@juillet@@" "mer.@mercredi@août@août@@" "mar.@mardi@sept.@septembre@@"
[10] "ven.@vendredi@oct.@octobre@@" "mar.@mardi@nov.@novembre@@" "sam.@samedi@déc.@décembre@@"
> str(lubridate:::.get_locale_regs("French_France.1252"))
List of 6
$ alpha_flex : Named chr [1:6] "((?<b_b>janv\\.|févr\\.|mars|avr\\.|mai|juin|juil\\.|août|sept\\.|oct\\.|nov\\.|déc\\.)|(?<B_b>janvier|février|"| __truncated__ "(?<B_B>janvier|février|mars|avril|mai|juin|juillet|août|septembre|octobre|novembre|décembre)(?![[:alpha:]])" "((?<a_a>jeu\\.|lun\\.|mar\\.|dim\\.|ven\\.|mer\\.|sam\\.)|(?<A_a>jeudi|lundi|mardi|dimanche|vendredi|mercredi|s"| __truncated__ "(?<A_A>jeudi|lundi|mardi|dimanche|vendredi|mercredi|samedi)(?![[:alpha:]])" ...
..- attr(*, "names")= chr [1:6] "b" "B" "a" "A" ...
$ num_flex : Named chr [1:24] "(?<d>[012]?[1-9]|3[01]|[12]0)(?!\\d)" "(?<q>[0]?[1-4])(?!\\d)" "(?<H>2[0-4]|[01]?\\d)(?!\\d)" "(?<H>2[0-4]|[01]?\\d)(?!\\d)" ...
..- attr(*, "names")= chr [1:24] "d" "q" "H" "h" ...
$ alpha_exact: Named chr [1:6] "((?<b_b_e>janv\\.|févr\\.|mars|avr\\.|mai|juin|juil\\.|août|sept\\.|oct\\.|nov\\.|déc\\.)|(?<B_b_e>janvier|févr"| __truncated__ "(?<B_B_e>janvier|février|mars|avril|mai|juin|juillet|août|septembre|octobre|novembre|décembre)(?![[:alpha:]])" "((?<a_a_e>jeu\\.|lun\\.|mar\\.|dim\\.|ven\\.|mer\\.|sam\\.)|(?<A_a_e>jeudi|lundi|mardi|dimanche|vendredi|mercre"| __truncated__ "(?<A_A_e>jeudi|lundi|mardi|dimanche|vendredi|mercredi|samedi)(?![[:alpha:]])" ...
..- attr(*, "names")= chr [1:6] "b" "B" "a" "A" ...
$ num_exact : Named chr [1:24] "(?<d_e>[012][1-9]|3[01]|[12]0)" "(?<q_e>[0][1-4])" "(?<H_e>2[0-4]|[01]\\d)" "(?<H_e>2[0-4]|[01]\\d)" ...
..- attr(*, "names")= chr [1:24] "d" "q" "H" "h" ...
$ wday_names :List of 2
..$ abr : chr [1:7] "dim\\." "lun\\." "mar\\." "mer\\." ...
..$ full: chr [1:7] "dimanche" "lundi" "mardi" "mercredi" ...
$ month_names:List of 2
..$ abr : chr [1:12] "janv\\." "févr\\." "mars" "avr\\." ...
..$ full: chr [1:12] "janvier" "février" "mars" "avril" ...
Ok, so on Windows all the abbreviations come with dots at the end. Let me see what I can do.
Should have been fixed. Would really appreciate if you guys could try the dev version and let me know if it works correctly now.
It is working correctly. Thank you!
I have to reopen this issue since it is still happening with weekdays. I am sorry that I noticed it now!
> library(lubridate)
> Sys.getlocale("LC_TIME")
[1] "Spanish_Spain.1252"
> dt <- seq(ymd("2018-01-01"), ymd("2018-12-31"), "day")
> head(wday(dt, label = TRUE))
[1] lu\\. ma\\. mi\\. ju\\. vi\\. sá\\.
Levels: do\\. < lu\\. < ma\\. < mi\\. < ju\\. < vi\\. < sá\\.
I confirm this bug. However, the solution for the guess_formats
(https://github.com/tidyverse/lubridate/commit/cc5f1a6de86863f983fd3f69ac842c31997a03a0) function works and can be easily implemented in .get_locale_regs
which is what is used in the wday
function.
It is necessary change this line (https://github.com/tidyverse/lubridate/blob/6f26b02de432cd9373ad4ce7766c36eacfc29918/R/guess.r#L311) by this:
mat[] <- gsub("\\.$", "", mat) # remove abbrev trailing dot in some locales (#781)
mat[] <- gsub("([].|(){^$*+?[])", "\\\\\\1", mat) # escaping meta chars
I imagine this works correctly with clock, since we don't do anything with regular expressions:
library(clock)
dt <- seq(date_parse("2018-01-01"), date_parse("2018-12-31"), "day")
head(date_month_factor(dt, labels = "es", abbreviate = TRUE))
#> [1] ene. ene. ene. ene. ene. ene.
#> 12 Levels: ene. < feb. < mar. < abr. < may. < jun. < jul. < ago. < ... < dic.
head(date_weekday_factor(dt, labels = "es", abbreviate = TRUE))
#> [1] lun. mar. mié. jue. vie. sáb.
#> Levels: dom. < lun. < mar. < mié. < jue. < vie. < sáb.
If the labels aren't exactly what you expect, you can always create a custom clock_labels()
object to use as the labels
argument
Hi, I found that this bug is still alive for month abbreviations different from their equivalent in English (jan != ene, apr != abr, aug != ago, dec != dic). The bug isn't SO specific: I reproduced it in Linux and Windows: Windows:
library(tidyverse)
library(lubridate)
test_dates <- tibble(abr_dates = c("ene-22",
"feb-22",
"mar-22",
"abr-22",
"may-22",
"jun-22",
"jul-22",
"ago-22",
"sep-22",
"oct-22",
"nov-22",
"dic-22"))
test_dates %>%
mutate(dates_date = my(abr_dates))
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `dates_date = my(abr_dates)`.
#> Caused by warning:
#> ! 4 failed to parse.
#> # A tibble: 12 × 2
#> abr_dates dates_date
#> <chr> <date>
#> 1 ene-22 NA
#> 2 feb-22 2022-02-01
#> 3 mar-22 2022-03-01
#> 4 abr-22 NA
#> 5 may-22 2022-05-01
#> 6 jun-22 2022-06-01
#> 7 jul-22 2022-07-01
#> 8 ago-22 NA
#> 9 sep-22 2022-09-01
#> 10 oct-22 2022-10-01
#> 11 nov-22 2022-11-01
#> 12 dic-22 NA
Created on 2023-06-11 with reprex v2.0.2
Linux:
library(reprex)
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
test_dates <- tibble(abr_dates = c("ene-22",
"feb-22",
"mar-22",
"abr-22",
"may-22",
"jun-22",
"jul-22",
"ago-22",
"sep-22",
"oct-22",
"nov-22",
"dic-22"))
test_dates %>%
mutate(dates_date = my(abr_dates))
#> Warning: 4 failed to parse.
#> # A tibble: 12 × 2
#> abr_dates dates_date
#> <chr> <date>
#> 1 ene-22 NA
#> 2 feb-22 2022-02-01
#> 3 mar-22 2022-03-01
#> 4 abr-22 NA
#> 5 may-22 2022-05-01
#> 6 jun-22 2022-06-01
#> 7 jul-22 2022-07-01
#> 8 ago-22 NA
#> 9 sep-22 2022-09-01
#> 10 oct-22 2022-10-01
#> 11 nov-22 2022-11-01
#> 12 dic-22 NA
Created on 2023-06-11 with reprex v2.0.2
If you are in linux you need to install the locale.
Hi, I'm having trouble with a script after the update of R, RStudio, and lubridate that previously worked fine and I'm not sure if it is related with this issue.
The following is a minimal example. In English the output is correct:
> library(lubridate)
Attaching package: ‘lubridate’
The following objects are masked from ‘package:base’:
date, intersect, setdiff, union
> Sys.setlocale("LC_TIME", "English")
[1] "English_United States.1252"
> sessionInfo()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_World.utf8 LC_CTYPE=English_World.utf8 LC_MONETARY=English_World.utf8
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
time zone: America/Buenos_Aires
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.3
loaded via a namespace (and not attached):
[1] compiler_4.4.1 generics_0.1.3 tools_4.4.1 timechange_0.3.0
> month(ymd(080101),label = T)
[1] Jan
Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < Oct < Nov < Dec
But in Spanish:
> Sys.setlocale("LC_TIME", "spanish")
[1] "Spanish_Spain.1252"
> sessionInfo()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_World.utf8 LC_CTYPE=English_World.utf8 LC_MONETARY=English_World.utf8
[4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252
time zone: America/Buenos_Aires
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.3
loaded via a namespace (and not attached):
[1] compiler_4.4.1 generics_0.1.3 tools_4.4.1 timechange_0.3.0
> month(ymd(080101),label = T)
Error in factor(x, ..., ordered = TRUE) :
invalid 'labels'; length 11 should be 1 or 12
In addition: Warning messages:
1: In strsplit(L, "@", fixed = TRUE) : input string 8 is invalid UTF-8
2: In strsplit(L, "@", fixed = TRUE) : input string 12 is invalid UTF-8
So I checked the labels for both locales (English and Spanish):
> names_EN <- lubridate:::.get_locale_regs("English_United States.1252")
> names_EN[["month_names"]][["full"]]
[1] "January" "February" "March" "April" "May" "June" "July" "August" "September" "October"
[11] "November" "December"
> names_EN[["month_names"]][["abr"]]
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
> names_ESP <- lubridate:::.get_locale_regs("Spanish_Spain.1252")
> names_ESP[["month_names"]][["full"]]
[1] "enero" "febrero" "marzo" "abril" "mayo" "junio" "julio" NA "septiembre"
[10] "octubre" "noviembre"
> names_ESP[["month_names"]][["abr"]]
[1] "ene" "feb" "mar" "abr" "may" "jun" "jul" NA "sep" "oct" "nov"
Indeed the Spanish labels have one month less. If I understand correclty Diciembre is missing and also Agosto (that should be in the NA position), both in the full and abbreviated labels.
Spanish_Spain.1252
Busca un formato UTF-8.
Por ejemplo prueba: Sys.setlocale("LC_TIME", "Spanish_Peru.utf8).
El problema es el formato que estas usando en español ("Spanish_Spain.1252").
Spanish_Spain.1252
Busca un formato UTF-8.
Por ejemplo prueba: Sys.setlocale("LC_TIME", "Spanish_Peru.utf8).
El problema es el formato que estas usando en español ("Spanish_Spain.1252").
Thank you! It worked :)
If I use Spanish or French Locale for abbreviated version of the month labels, these are displayed with "ene\.", but it should be only "ene.". I'm using Windows 10.