ropensci-archive / datapkg

:no_entry: ARCHIVED :no_entry: Read and Write Data Packages
https://docs.ropensci.org/datapkg
Other
40 stars 6 forks source link

datapkg reading of bad tabular files doesn't match read.csv()'s best effort #13

Open danfowler opened 8 years ago

danfowler commented 8 years ago

I was inspired by https://github.com/johnmyleswhite/CorruptCSVs to have a go at packaging the TSVs and using datapkg to read, but noticed different behavior from read.csv() when reading the r_example_1.tsv. R's read.csv() behavior seems preferable especially as the datapkg version simply discards that (badly formatted) second column.

Maybe relevant to @roll @georgiana-b @akariv @pwalsh as well?

#r_example_1.tsv
1
2
3
4
5   0

R's read.csv():

> read.csv("r_example_1.tsv", header = FALSE, sep = "\t")
  V1 V2
1  1 NA
2  2 NA
3  3 NA
4  4 NA
5  5  0

The datapkg approach:

library(datapkg)

tables <- datapkg_read()

tables$data[[1]]

# A tibble: 5 x 1
     X1
  <chr>
1     1
2     2
3     3
4     4
5     5

https://github.com/danfowler/CorruptCSVs