queryverse / Query.jl

Query almost anything in julia
Other
394 stars 49 forks source link

@mutate changes other column type #305

Open stej opened 4 years ago

stej commented 4 years ago

Repost from https://discourse.julialang.org/t/query-mutate-changes-other-column-type/39292 as it looks like a bug.

This is my smallest code with repro:

contents = """
"5674012","aa66aa66"
"5674012","b036aa66,b036aa67,b036aa68"
""";

batches = CSV.File(IOBuffer(contents); header = ["X1", "Splits"], delim = ',') |> DataFrame;

emptyStringArray = Array{SubString{String},1}()
batchesAny = batches |> 
            @mutate(Splits = length(_.Splits) > 0 ? split(_.Splits, ',') : emptyStringArray) |> DataFrame
batchesString = batches |> 
            @mutate(Splits = length(_.Splits) > 0 ? split(_.Splits, ',') : Array{SubString{String},1}()) |> DataFrame

The output is like this:

julia> batchesAny
2×2 DataFrame
│ Row │ X1      │ Splits                               │
│     │ Any     │ Any                                  │
├─────┼─────────┼──────────────────────────────────────┤
│ 1   │ 5674012 │ ["aa66aa66"]                         │
│ 2   │ 5674012 │ ["b036aa66", "b036aa67", "b036aa68"] │

julia> batchesString
2×2 DataFrame
│ Row │ X1      │ Splits                               │
│     │ Int64   │ Array{SubString{String},1}           │
├─────┼─────────┼──────────────────────────────────────┤
│ 1   │ 5674012 │ ["aa66aa66"]                         │
│ 2   │ 5674012 │ ["b036aa66", "b036aa67", "b036aa68"] │

What I don’t understand:

  1. :Splits column type differs - Any vs. Array{SubString{String},1}. (I just wanted to save memory so I stored the value (that can be repeated) to emptyStringArray.)

  2. Even if I understand that I made something bad to column :Splits, I think X1's type shouldn't be changed to Any in batchesAny.