rstudio / rsconnect

Publish Shiny Applications, RMarkdown Documents, Jupyter Notebooks, Plumber APIs, and more
http://rstudio.github.io/rsconnect/
131 stars 80 forks source link

Error detecting locale - incomplete final line #233

Closed cderv closed 1 year ago

cderv commented 6 years ago

Hi,

When deploying to a RStudio Connect server with rsconnect, I have this warning.

Warning message:
Error detecting locale: Error in read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'raw'
 (Using default: en_US) 

I believe there is something wrong in the process of detecting local on Windows.

I found that rsconnect:::systemInfo is used for detecting local on windows. The command systeminfo /FO csv is called with system. I get back a result in a csv format. Currently, the function systemInfo use read.csv that cause the error. If I use readr::read_csv no more error.


raw <- system("systeminfo /FO csv", intern = TRUE, wait = TRUE)
# I get a warning
info.csv <- read.csv(textConnection(raw))
#> Error in read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'raw'
# I get no warning
info_csv <- readr::read_csv(raw)
#  and locale can be detected
locale <- as.character(info_csv[[20]])
locale
#> [1] "fr;Fran<U+0087>ais (France)"
# but I get an encoding issue that cause error after
strsplit(unlist(strsplit(locale, ";", fixed=TRUE)), "-", fixed=TRUE)
#> Warning in strsplit(locale, ";", fixed = TRUE): la chaîne de caractères
#> entrée 1 est incorrecte comme encodage UTF-8
#> [[1]]
#> [1] NA

# If I try to provide raw as text input, no warning
info.csv.txt <- read.csv(text = raw)
# I get a "strange" character
locale <- as.character(info.csv.txt[[20]])
locale
#> [1] "fr;Fran‡ais (France)"
# but it works
strsplit(unlist(strsplit(locale, ";", fixed=TRUE)), "-", fixed=TRUE)
#> [[1]]
#> [1] "fr"
#> 
#> [[2]]
#> [1] "Fran‡ais (France)"

Not sure what is the encoding of the string return by systeminfo /FO csv. So I tested with what I know

readr::guess_encoding(raw)
#> # A tibble: 2 x 2
#>       encoding confidence
#>          <chr>      <dbl>
#> 1 windows-1252       0.38
#> 2 windows-1250       0.22
stringi::stri_enc_detect(raw)
#> [[1]]
#> [[1]]$Encoding
#> [1] "windows-1252" "windows-1250" "windows-1254" "UTF-16BE"    
#> [5] "UTF-16LE"     "GB18030"      "EUC-JP"       "EUC-KR"      
#> [9] "Big5"        
#> 
#> [[1]]$Language
#> [1] "fr" "ro" "tr" ""   ""   "zh" "ja" "ko" "zh"
#> 
#> [[1]]$Confidence
#> [1] 0.72 0.37 0.14 0.10 0.10 0.10 0.10 0.10 0.10
#> 
#> 
#> [[2]]
#> [[2]]$Encoding
#> [1] "windows-1252" "windows-1250" "UTF-16BE"     "UTF-16LE"    
#> [5] "EUC-JP"       "EUC-KR"       "windows-1254"
#> 
#> [[2]]$Language
#> [1] "fr" "ro" ""   ""   "ja" "ko" "tr"
#> 
#> [[2]]$Confidence
#> [1] 0.22 0.15 0.10 0.10 0.10 0.10 0.06
stringi::stri_enc_detect2(raw, locale = "fr")
#> [[1]]
#> [[1]]$Encoding
#>  [1] "macintosh"          "x-mac-turkish"      "ISO-8859-15"       
#>  [4] "windows-1258"       "ibm-1258_P100-1997" "ibm-1129_P100-1997"
#>  [7] "windows-1252"       "windows-1254"       "ibm-1252_P100-2000"
#> [10] "ibm-1254_P100-1995"
#> 
#> [[1]]$Language
#>  [1] NA NA NA NA NA NA NA NA NA NA
#> 
#> [[1]]$Confidence
#>  [1] 0.7500000 0.7500000 0.5833333 0.5833333 0.5833333 0.5833333 0.4027778
#>  [8] 0.4027778 0.4027778 0.4027778
#> 
#> 
#> [[2]]
#> [[2]]$Encoding
#>  [1] "ISO-8859-15"        "windows-1252"       "windows-1254"      
#>  [4] "windows-1258"       "ibm-1252_P100-2000" "ibm-1254_P100-1995"
#>  [7] "ibm-1258_P100-1997" "ibm-1129_P100-1997" "macintosh"         
#> [10] "x-mac-turkish"     
#> 
#> [[2]]$Language
#>  [1] NA NA NA NA NA NA NA NA NA NA
#> 
#> [[2]]$Confidence
#>  [1] 0.9655172 0.9655172 0.9655172 0.9655172 0.9655172 0.9655172 0.9655172
#>  [8] 0.9655172 0.7413793 0.7413793

If we try to provide encoding in some way

# If I specify the encoding in textConnection, no more warning
info.csv <- read.csv(textConnection(raw, encoding = "UTF-8"))
# I get also another "strange" character
locale <- as.character(info.csv[[20]])
locale
#> [1] "fr;Fran‡ais (France)"
# but it works
strsplit(unlist(strsplit(locale, ";", fixed=TRUE)), "-", fixed=TRUE)
#> [[1]]
#> [1] "fr"
#> 
#> [[2]]
#> [1] "Fran‡ais (France)"

Do you have any idea on this ? What fix could be done ?

Created on 2018-01-10 by the reprex package (v0.1.1.9000).

bvprod2 commented 5 years ago

same issue seems related to non english language

hfberg commented 4 years ago

I had the same issue with read.xlsx. It was solved with xlsx::read.xlsx. Thank you!

Sade154 commented 3 years ago

Exactly same issue with the french language when deploying to shinyapps

kevinushey commented 3 years ago

I wonder why we use systeminfo here instead of Sys.getlocale(). It looks like this code is now quite old so any memory of why is probably long gone ...

kippandrew commented 3 years ago

My memory is very hazy, but I believe the issue is that Sys.getlocale() is platform specific. In other words, on windows, you'll get a value that isn't meaningful to linux, which shinyapps.io runs under the hood. systeminfo does provide a meaningful locale for linux.

Sade154 commented 3 years ago

The issue would be solved, i guess, if we could handle the weard encoding returned by system("systeminfo /FO csv"). On three different windows laptops (from France), I got the same output :

system("systeminfo /FO csv", intern = TRUE, wait = TRUE)

[1] "\"Nom de l'h“te\",\"Nom du systŠme d'exploitation\",\"Version du systŠme\",\"Fabricant du systŠme d'exploitation\",\"Configuration du systŠme d'exploitation\ [TRUNCATED]"
kippandrew commented 3 years ago

That makes sense. One option you could try as a workaround: setting the rsconnect.locale option, which should bypass the automatic detection code.

szmsu2011 commented 3 years ago

That makes sense. One option you could try as a workaround: setting the rsconnect.locale option, which should bypass the automatic detection code.

Could you please give an example on how to set the rsconnect.locale to English?

I have tried all sorts of ways such as options(rsconnect.locale = "en_US") and even changing my Windows locale to English (from Chinese) but the default locale detected by shinyapps.io is still CN.

I found a hacky way by setting Sys.setlocale("LC_ALL", "C") before the app is called, but in such a way, all my date formatting from lubridate broke.

Thank you in advance.

kevinushey commented 3 years ago

You might need to set it within a .rsconnect_profile, especially if you're trying to deploy from RStudio. See ?rsconnect::options for more details.

cderv commented 1 year ago

FWIW systeminfo /FO csv will return in Latin1 encoding. This is default for CMD output.

Doing this works if we can assumes that it will always return latin1

systemInfo <- function() {
  raw <- system("systeminfo /FO csv", intern = TRUE, wait = TRUE)
  Encoding(raw) <- rep_len("latin1", length(raw))
  info <- read.csv(textConnection(raw))
  return(info)
}

With CMD in Windows, we can also force to output in UTF-8 by changing the default code page.

For example

systemInfo <- function() {
  commands <- c(
    "@ECHO OFF",
    "CHCP 65001 > nul",
    "systeminfo /FO csv"
  )
  bat <- tempfile(fileext = ".bat")
  on.exit(unlink(bat), add = TRUE)
  writeLines(commands, bat, useBytes = TRUE)
  raw <- system(bat, intern = TRUE, wait = TRUE)
  info <- read.csv(textConnection(raw))
  return(info)
}
Git patch ```diff diff --git a/R/locale.R b/R/locale.R index f2d8b6b..74c7d37 100644 --- a/R/locale.R +++ b/R/locale.R @@ -78,7 +78,16 @@ systemLocale <- function() { } systemInfo <- function() { - raw <- system("systeminfo /FO csv", intern = TRUE, wait = TRUE) + commands <- c( + "@ECHO OFF", + "CHCP 65001 > nul", + "systeminfo /FO csv" + ) + bat <- tempfile(fileext = ".bat") + on.exit(unlink(bat), add = TRUE) + writeLines(commands, bat, useBytes = TRUE) + raw <- system(bat, intern = TRUE, wait = TRUE) info <- read.csv(textConnection(raw)) return(info) } ````

Hope it helps

hadley commented 1 year ago

Possible alternative approach from @gaborcsardi:

utils::readRegistry("Control Panel\\International\\User Profile", hive = "HCU")$Languages
#> [1] "en-US" "de-DE" "es-ES" "hu"

We're just exploring how far back in time this key exists.

cderv commented 1 year ago

Regarding alternative, when targeting windows only, powershell can be an option

shell("(Get-WinSystemLocale).Name", "powershell", intern = TRUE)
#> [1] "fr-FR"
system2("powershell", c("-Command", "(Get-WinSystemLocale).Name"), stdout = TRUE)
#> [1] "fr-FR"

Available since windows 8/server2012 I think. Powershell is not used that much with R but I think it available by default on windows since some time. I use that in ps1 script but probably reading from registry is better from R. Just sharing in case it can help

gaborcsardi commented 1 year ago

It needs Windows 8.1 it seems.

But this works on Windows 10 and back to Vista, everywhere:

utils::readRegistry(hive = "HCU", "Control Panel\\International")$LocaleName
#> en-US