ropensci / skimr

A frictionless, pipeable approach to dealing with summary statistics
https://docs.ropensci.org/skimr
1.11k stars 79 forks source link

Error in nchar(x) : invalid multibyte string #401

Closed DesiQuintans closed 5 years ago

DesiQuintans commented 5 years ago

Can skimr be helped to support a wider range of character encodings? I ran into this issue:

  1. Grab this insect light trap dataset from Kaggle.
  2. Look at row 845. It contains this line:
LEPIDOPTERA,COLEOPHORIDAE,Coleophora asteris MŸhl.,1997,8/14/97,8/17/97,1
  1. Load this dataset into R and try to skim() it. It stumbles on line 844 (the first line was absorbed as column names).
> skim(dat)
## Error in nchar(x) : invalid multibyte string, element 844
elinw commented 5 years ago

What operating system are you on?

elinw commented 5 years ago

Okay I think this is due to min_char() and max_char() not handling multibyte characters correctly. I will take a look at that but in the meantime you could use a custom skimmer list to set those to NULL.

elinw commented 5 years ago

I'm going to merge a fix for this (changes the nchar function to useallowNA=TRUE) into the v2 branch as soon as the tests finish. If you want you can install the v2 version using devtools::install_github("ropensci/skimr", "v2"). Since we are close to releasing v2 I don't think I'll release v1 to fix it but I will fix in the development branch of v1.

elinw commented 5 years ago

This is now fixed in the v2 branch.

DesiQuintans commented 5 years ago

Yup, fixed! Thanks for the fast turn-around on this bug.