mountainMath / cancensus

R wrapper for calling CensusMapper APIs
https://mountainmath.github.io/cancensus/index.html
Other
82 stars 15 forks source link

CRAN NOTE for marked utf-8 strings #164

Closed dshkol closed 3 years ago

dshkol commented 3 years ago

https://cran.r-project.org/web/checks/check_results_cancensus.html

Version: 0.4.0
Check: data for non-ASCII characters
Result: NOTE
     Note: found 23 marked UTF-8 strings
Flavors: r-devel-linux-x86_64-fedora-clang, r-devel-linux-x86_64-fedora-gcc, r-patched-solaris-x86, r-release-macos-x86_64, r-oldrel-macos-x86_64
mountainMath commented 3 years ago

I can try and re-encode the data to see if it goes away. Hard to fix because none of our checks pick it up.

I’ll try something along these lines: data for non-ASCII characters

dshkol commented 3 years ago

I've found the offending data, it's in CODE_TABLE, as expected.

length(grep("NOT_ASCII", iconv(CODES_TABLE$status_code, "UTF-8", "ASCII", sub="NOT_ASCII"))) + length(grep("NOT_ASCII", iconv(CODES_TABLE$status, "UTF-8", "ASCII", sub="NOT_ASCII")))
[1] 23

I saw a thread on r-dev that suggests if the non-ascii chars are intentional we should just comment in our cran notes as such, but might have issues for portability.

mountainMath commented 3 years ago

Hard to believe that there is no way to include non-ascii characters in a way that works on all R platforms. I say we just go with that route.

dshkol commented 3 years ago

Will leave as such with an explicit note in cran comments that this is intentioal.