queryverse / CSVFiles.jl

FileIO.jl integration for CSV files
Other
51 stars 13 forks source link

How do I iterate through each column? #63

Closed xiaodaigh closed 5 years ago

xiaodaigh commented 5 years ago

Say I have read in my files like this

using CSVFiles

@time a = load("c:/data/AirOnTimeCSV/airOT199302.csv", type_detect_rows = 2000)

Is there a way to iterate through the columns without converting it to DataFrame first (cos it's slow)?

E.g. if I did convert to DataFrame then I can do

adf = DataFrame(a)
for c in eachcol(adf)
   # do something to c, like serialize to disk.
end
davidanthoff commented 5 years ago

Converting to a DataFrame shouldn't be slow, the overhead of that should almost be nothing. If that is not so, something is wrong.

Note, though, that just calling load will actually not read the file from disc: it returns a type that will load the data from disc once you materialize it into something.

Having said all of this, you can pretty easily get the columns by using the get_columns_copy_using_missing interface from TableTraits.jl:

x = load(filename)

if supports_get_columns_copy_using_missing(x)
  columns = TableTraits.get_columns_copy_using_missing(x)
end