ropensci / allodb

An R package for biomass estimation at extratropical forest plots.
https://docs.ropensci.org/allodb/
GNU General Public License v3.0
36 stars 11 forks source link

Fix encoding #3

Closed maurolepore closed 6 years ago

maurolepore commented 7 years ago

Some data has the wrong encoding. devtools::check() throws these warnings:

Following this post, below is my best attempt to fix the problem. But the solution is not good enough: At best, the non-ASCII characters are removed. What I want it to replace them with the correct character.

Suzanne said the encoding is "latin1" (https://goo.gl/KZiVbQ). But the conversion from latin-ascii doesn't work well enough (see below).

Maybe if I receive the data in .csv format? And I can read it with the right encoding? Something like this: read.csv(data, encoding = "latin1").

My suboptimal solution so far

library(tidyverse)
library(stringi)
library(allodb)

WSG %>% 
  mutate(encode = stri_enc_mark(species)) %>% 
  filter(encode != "ASCII") %>% 
  transmute(
    original = species,
    with_stri = stri_trans_general(species, "latin-ascii"),
    with_iconv = iconv(species, "latin1", "ASCII", sub = "")
  )
# # A tibble: 7 x 3
#                         original                        with_stri              with_iconv
#                            <chr>                            <chr>                   <chr>
# 1                     bigll3¡                       bigll3A,A¡                  bigll3
# 2                     pequeña                       pequeAfA±a                  pequea
# 3       sp. ‘hairy’       sp. A¢a,¬EoehairyA¢a,¬a,,¢               sp. hairy
# 4                   ‘giant                    A¢a,¬Eoegiant                   giant
# 5   dewevrei (De Wild.) J.Lí€     dewevrei (De Wild.) J.LA-a,¬ dewevrei (De Wild.) J.L
# 6 normandii Aubr퀌©v. & Pe normandii AubrA-a,¬A'A(C)v. & Pe   normandii Aubrv. & Pe
# 7 pellegrinianum (J.L퀌©on pellegrinianum (J.LA-a,¬A'A(C)on   pellegrinianum (J.Lon
maurolepore commented 7 years ago

https://mail.google.com/mail/u/0/?zx=yxdmg1oh1mf4#search/is%3Aunread+OR+has%3Apurple-question/15fcaccb154df2c0

maurolepore commented 6 years ago

I'll worry about this once I know what data makes it to allodb. This issue was opened at early stages and refers to a general dataset, WSG, that is still being considered.