gibberish in omnils - Githubissues

ShuguangSun commented 6 years ago

Environment: Windows 7, Chinese (gbk/cp963 locals) Vim 8 with most updated patches R: 3.4.3, language=none in .Renviron and options(encoding = "UTF-8") in .Rprofile. In Vim, I start R, and R will be in english and with UTF-8 encoding. The help on objects are correct and no gibberish. The issue has minor effect on Nvim-R completion (C-x C-o). However, it does impact ncm-R which can't find our the encoding the omnils files.

What is the issue: For example in omnils_methods_3.4.3 .TSummary:baseenvironmentenvmethodsNot a function .classEnvfunctionfunctionmethodsClass default.requirePackage("methods") mustFindTRUEUtilities for Managing Class DefinitionsThese are various functions to support the definition and use of\Nformal classes. Most of them are rarely suitable to be called\Ndirectly.\NOthers are somewhat experimental and/or partially implemented only. Do\Nrefer to 鈥榮etClass鈥 .debugMethodfunctionfunctionmethodsfun text"" conditionNULL signature onceFALSE

What is expected: .TSummary:baseenvironmentenvmethodsNot a function .classEnvfunctionfunctionmethodsClass default.requirePackage("methods") mustFindTRUEUtilities for Managing Class DefinitionsThese are various functions to support the definition and use of\Nformal classes. Most of them are rarely suitable to be called\Ndirectly.\NOthers are somewhat experimental and/or partially implemented only. Do refer to 'setClass' for normal code development .debugMethodfunctionfunctionmethodsfun text"" conditionNULL signature onceFALSE

.debugMethod is another function and needs a new line.
'setClass' is not correctly decoded.

I have looked into the RClassUtils.html

<p>Others are somewhat experimental and/or partially implemented only. Do
refer to <code><a href="setClass.html">setClass</a></code> for normal code development.
</p>

and the RClassUtils.Rd

Others are somewhat experimental and/or partially implemented only. Do
refer to \code{\link{setClass}} for normal code development.

and other gibberishes. It seems the setClass or \code{\link{setClass}} will not be correctly built as omnils.

jalvesaq commented 6 years ago

Thanks for reporting the issue! I have rebooted the computer on Windows, and changed the line 229 from R/nvimcom/R/nvim.bol.R from

writeLines(x, f)

to

writeLines(x, f, useBytes = TRUE)

However, the issue was not fixed. Anyway, I cannot replicate all the issues that you reported because my Windows locale is Latin-1.

If you wanted to try something yourself, the procedure is:

Edit the file R/nvimcom/R/nvim.bol.R.
Delete all files from ~/AppData/Roaming/Nvim-R directory.
Start R, and try to complete function names such as as.matrix.noquote and other problematic ones.

Note: If you find a solution it should work with options(encoding = "UTF-8") as well as without setting this option because certainly most users do not set it.

ShuguangSun commented 6 years ago

It is the function "CleanOmniLine" in which the unicode single quote \u2018, \u2019, and double quote \u201c, \u201d are hard coded to substitue quotes etc. If I remove them, the omnils will be fine.

Could we add an option to choose whether using the fancy quotes?

I'm not expert on R encoding and R communicating wiht system. I tried the function writeLines("\u2018, \u2019") and it produce "\241\256, \241\257" (decodec as GBK) which is not displayed even R is set to options(encoding = "UTF-8"). It seems the communication between Vim and R is in system local (GBK). R: options(encoding = "UTF-8") Vim: set encoding=utf-8 set termencoding=utf-8 set fencs=utf-8,gbk
set scriptencoding utf-8

writeLines("\u2018, \u2019") can print in rgui and rstudio.

jalvesaq commented 6 years ago

Most users experiencing the problem would hardly find the option buried in the Nvim-R documentation. So, I think it is better if we find a single solution for everyone. Perhaps this:

CleanOmnils <- function(f)
{
    x <- readLines(f)
    x <- CleanOmniLine(x)
    if(.Platform$OS.type == "windows"){
        x <- gsub("\u2018", "'", x)
        x <- gsub("\u2019", "'", x)
        x <- gsub("\u201c", '"', x)
        x <- gsub("\u201d", '"', x)
    }
    writeLines(x, f)
}

What do you think?

ShuguangSun commented 6 years ago

Thanks. It will solve the problem.

jalvesaq commented 6 years ago

Thanks for your help!

ShuguangSun commented 6 years ago

'enc2native' is required because R takes the string as native encoding nomatter whether options(encoding = "UTF-8") which is quite strange. In my windows enviroment, the code below can produce the unicode single/double quotes. If no 'enc2native', the gibberish will be generated during the 'gsub' in which R will read \u2018 as native encoded instead of utf-8.

It is not tested under Linux.

Please keep your patch for windows to gsub those unicode. It because that in windows python (ncm-R using python) would try using the native coding to decode the omnils without complicated setting.

CleanOmniLine <- function(x)
{
    x <- gsub("\\\\R", "R", x)
    x <- gsub("\\\\link\\{(.+?)\\}", "\\1", x)
    x <- gsub("\\\\link\\[.+?\\]\\{(.+?)\\}", "\\1", x)
    x <- gsub("\\\\code\\{(.+?)\\}", enc2native("\u2018\\1\u2019"), x)
    x <- gsub("\\\\samp\\{(.+?)\\}", enc2native("\u2018\\1\u2019"), x)
    x <- gsub("\\\\file\\{(.+?)\\}", enc2native("\u2018\\1\u2019"), x)
    x <- gsub("\\\\sQuote\\{(.+?)\\}", enc2native("\u2018\\1\u2019"), x)
    x <- gsub("\\\\dQuote\\{(.+?)\\}", enc2native("\u201c\\1\u201d"), x)
    x <- gsub("\\\\emph\\{(.+?)\\}", "\\1", x)
    x <- gsub("\\\\bold\\{(.+?)\\}", "\\1", x)
    x <- gsub("\\\\pkg\\{(.+?)\\}", "\\1", x)
    x <- gsub("\\\\item\\{(.+?)\\}", "\\1", x)
    x <- gsub("\\\\item ", enc2native("\\\\N  \u2022 "), x)
    x <- gsub("\\\\itemize\\{(.+?)\\}", "\\1", x)
    x <- gsub("\\\\eqn\\{.+?\\}\\{(.+?)\\}", "\\1", x)
    x <- gsub("\\\\eqn\\{(.+?)\\}", "\\1", x)
    x <- gsub("\\\\cite\\{(.+?)\\}", "\\1", x)
    x <- gsub("\\\\url\\{(.+?)\\}", "\\1", x)
    x <- gsub("\\\\linkS4class\\{(.+?)\\}", "\\1", x)
    x <- gsub("\\\\command\\{(.+?)\\}", "`\\1`", x)
    x <- gsub("\\\\href\\{\\{.+?\\}\\{(.+?)\\}\\}", enc2native("\u2018\\1\u2019"), x)
    x <- gsub("\\\\ifelse\\{\\{latex\\}\\{\\\\out\\{.\\}\\}\\{ \\}\\}\\{\\}", " ", x) # \sspace
    x <- gsub("\\\\ldots", "...", x)
    x <- gsub("\\\\dots", "...", x)
    x <- gsub("\\\\preformatted\\{(.+?)\\}", "\\\\N\\1\\\\N", x)

    x
}

jalvesaq commented 6 years ago

The fancy quotes improve documentation readability, but they are not that important in omni completion. Moreover, if the user switches from UTF-8 to native encoding and vice-versa, the encoding issue may appear. So, I think it is better to choose the safest option and write a function for Windows with ASCII only characters, and let fancy quotes only for other platforms.

jalvesaq / Nvim-R

gibberish in omnils #276