xiaodaigh / JDF.jl

Julia DataFrames serialization format
MIT License
88 stars 9 forks source link

Column names mis-match when loading certain columns #77

Closed Ossifragus closed 2 years ago

Ossifragus commented 2 years ago

The relative order of the loaded columns seem to be always the same of the relative order the whole data (Is the order specified in the metadata?). For example with a data consisting two columns "col1" and "col2", the code JDF.load(data; cols = ["col1", "col2"]) loads the data correctly, but with JDF.load(data; cols = ["col2", "col1"]) the data columns and their names would not match.

Please see the following example:

using RDatasets, JDF, DataFrames

JDF.save("iris.jdf", dataset("datasets", "iris"))
a = DataFrame(JDF.load("iris.jdf"))

c1 = ["Species", "PetalWidth"]
a1 = DataFrame(JDF.load("iris.jdf"; cols = c1)) # column mis-match

julia> first(a1, 5)
5×2 DataFrame
 Row │ Species  PetalWidth 
     │ Float64  Cat…       
─────┼─────────────────────
   1 │     0.2  setosa
   2 │     0.2  setosa
   3 │     0.2  setosa
   4 │     0.2  setosa
   5 │     0.2  setosa

c2 = ["PetalWidth", "Species"]
a2 = DataFrame(JDF.load("iris.jdf"; cols = c2)) # column match

julia> first(a2, 5)
5×2 DataFrame
 Row │ PetalWidth  Species 
     │ Float64     Cat…    
─────┼─────────────────────
   1 │        0.2  setosa
   2 │        0.2  setosa
   3 │        0.2  setosa
   4 │        0.2  setosa
   5 │        0.2  setosa
xiaodaigh commented 2 years ago

columns are loaded as specified in the order by cols. This is the intended behaviour though