wesm / feather

Feather: fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow
Apache License 2.0
2.74k stars 168 forks source link

Should feather::read_feather respect options(stringsAsFactors = FALSE)? #384

Closed khughitt closed 4 years ago

khughitt commented 4 years ago

Greetings!

Is there any reason why the R implementation of read_feather() ignores the state of the stringsAsFactors option?

For consistency, it seems like it would be useful to have it behave in a similar manner to read.delim(), read_tsv(), etc...

Ex:

library(feather)
library(tidyverse)
options(stringsAsFactors = FALSE)

write_tsv(head(iris), 'test.tsv')
write_feather(head(iris), 'test.feather')

head(read.delim('test.tsv', sep='\t')$Species)
# [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"

head(read_tsv('test.tsv', col_types = cols())$Species)
# [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"

head(read_feather('test.feather')$Species)
# [1] setosa setosa setosa setosa setosa setosa
# Levels: setosa versicolor virginica

Incidentally, I checked read_parquet() and it too ignores stringAsFactors.

Versions:

wesm commented 4 years ago

Can you open an issue on the Arrow JIRA issue tracker?

khughitt commented 4 years ago

Sure thing - reported issue here: https://issues.apache.org/jira/browse/ARROW-7823

I created a similar issue for read_parquet() as well.

nealrichardson commented 4 years ago

Thanks!