Open jsonbecker opened 10 years ago
Good question @jasonpbecker
Can you give me an example of when this happens? By default R should fill in NA
s whenever it encounters a empty cell.
x1,x2,x3
4,1,3
5,,
6,3,234
If I read this .csv
file into R, it will automatically convert blank fields to NA
.
> (x <- read.csv("~/Desktop/temp.csv"))
x1 x2 x3
1 4 1 3
2 5 NA NA
3 6 3 234
I would really appreciate an example of this " This is especially annoying with factors, which then creates a level for the blank space."
So if you read this file:
foo, bar,,,,2014-09-10, 50.00
baz, bat, ,,2014-09-10, 2014-09-09, 105.00
foo, bat,6103914,,,2014-09-10, 5.00
> read.csv('~/Desktop/test.csv', header=FALSE, stringsAsFactors=FALSE)
V1 V2 V3 V4 V5 V6 V7
1 foo bar NA NA 2014-09-10 50
2 baz bat NA NA 2014-09-10 2014-09-09 105
3 foo bat 6103914 NA 2014-09-10 5
Classes and values for V5
:
> sapply(read.csv('~/Desktop/test.csv', header=FALSE, stringsAsFactors=FALSE), class)
V1 V2 V3 V4 V5
"character" "character" "integer" "logical" "character"
V6 V7
"character" "numeric"
> table(read.csv('~/Desktop/test.csv', header=FALSE, stringsAsFactors=FALSE)$V5)
2014-09-10
2 1
If you don't use stringsAsFactors=FALSE
, you get a similar result but the white space is now a level in the factor for V5, etc.
One thing I run into a bunch is a blank field (most often with white space) used as missing. This is especially annoying with factors, which then creates a level for the blank space.
Currently, white space alone is not considered a
NA_aliases
(see here).Should
test_na
andfix_na
be updated to treat white space as missing, or perhaps should there be a new function that tests for empty levels or blank fields and the fix modifies toNA
?I'm happy to contribute to implement either.