ltorgo / DMwR2

Functions and data supporting the 2nd edition of the book Data Mining with R by Luis Torgo, published by CRC Press
27 stars 19 forks source link

sampleCSV not reading header #4

Open WarrenC opened 6 years ago

WarrenC commented 6 years ago

I downloaded “flights14.csv” from the following site. https://github.com/arunsrinivasan/flights/wiki/NYC-Flights-2014-data

I am working on macOS High Sierra Version 10.13.3. I am using R version 3.5.0 (2018-04-23) -- "Joy in Playing” with DMwR2_0.0.2. I created “test2” as subdirectories in “Users” as an example. /Users/test2

The column names are not displayed in the tibble when I run the following line. Instead it seems to be using one of the rows of flight data for the column names. flights1000_2 <- sampleCSV(file = "/Users/test2/flights14.csv", percORn = 1000, header = T)

The following message is displayed in the console. Parsed with column specification: cols( 2014 = col_integer(), 1 = col_integer(), 1_1 = col_integer(), 847 = col_integer(), -3 = col_integer(), 1036 = col_integer(), 1_2 = col_integer(), 0 = col_integer(), AA = col_character(), N553AA = col_character(), 313 = col_integer(), LGA = col_character(), ORD = col_character(), 139 = col_integer(), 733 = col_integer(), 8 = col_integer(), 47 = col_integer() ) Warning message: Duplicated column names deduplicated: '1' => '1_1' [3], '1' => '1_2' [7]

smottaghinejad commented 4 years ago

I'm experiencing the same problem.

CeliaZhu commented 4 years ago

Same for me

danielmitre commented 3 years ago

The error seems to be that the header may be sampled out of the temporary created file:

https://github.com/ltorgo/DMwR2/blob/c19cb08742040b245b1c5c03069ccbe5643aff72/R/utils.R#L213