xiaodaigh / JDF.jl

Julia DataFrames serialization format
MIT License
90 stars 9 forks source link

Column names mis-match when loading certain columns #79

Closed xiaodaigh closed 2 years ago

xiaodaigh commented 2 years ago

Discussed in https://github.com/xiaodaigh/JDF.jl/discussions/78

Originally posted by **Ossifragus** January 26, 2022 The relative order of the loaded columns seem to be always the same of the relative order the whole data (Is the order specified in the metadata?). For example with a `data` consisting two columns `"col1"` and `"col2"`, the code `JDF.load(data; cols = ["col1", "col2"])` loads the data correctly, but with `JDF.load(data; cols = ["col2", "col1"])` the data columns and their names would not match. Please see the following example: ```julia using RDatasets, JDF, DataFrames JDF.save("iris.jdf", dataset("datasets", "iris")) a = DataFrame(JDF.load("iris.jdf")) c1 = ["Species", "PetalWidth"] a1 = DataFrame(JDF.load("iris.jdf"; cols = c1)) # column mis-match julia> first(a1, 5) 5×2 DataFrame Row │ Species PetalWidth │ Float64 Cat… ─────┼───────────────────── 1 │ 0.2 setosa 2 │ 0.2 setosa 3 │ 0.2 setosa 4 │ 0.2 setosa 5 │ 0.2 setosa c2 = ["PetalWidth", "Species"] a2 = DataFrame(JDF.load("iris.jdf"; cols = c2)) # column match julia> first(a2, 5) 5×2 DataFrame Row │ PetalWidth Species │ Float64 Cat… ─────┼───────────────────── 1 │ 0.2 setosa 2 │ 0.2 setosa 3 │ 0.2 setosa 4 │ 0.2 setosa 5 │ 0.2 setosa ```