jkrumbiegel / DataFrameMacros.jl

Macros that simplify working with DataFrames.jl
MIT License
61 stars 4 forks source link

@combine and unique #6

Closed juliohm closed 3 years ago

juliohm commented 3 years ago

The following MWE is producing a different result than what I expected:

df = DataFrame(C = [:a,:a,:a,:b,:b,:b,:b], V = [1,2,3,4,5,6,7])

@chain df begin
  @groupby(:C)
  @combine(unique(:V))
end

7×2 DataFrame
 Row │ C       V_unique 
     │ Symbol  Int64    
─────┼──────────────────
   1 │ a              1
   2 │ a              2
   3 │ a              3
   4 │ b              4
   5 │ b              5
   6 │ b              6
   7 │ b              7

I expected a dataframe with two rows:

DataFrame(C = [:a,:b], V_unique = [[1,2,3],[4,5,6,7]])

2×2 DataFrame
 Row │ C       V_unique     
     │ Symbol  Array…       
─────┼──────────────────────
   1 │ a       [1, 2, 3]
   2 │ b       [4, 5, 6, 7]
jkrumbiegel commented 3 years ago

That is the normal behavior for combine, if you return an array per group they get concatenated into rows. You can use Ref around the expression to avoid that.

juliohm commented 3 years ago

But that seems inconsistent with other query frameworks. Query.jl will return the result with two rows as expected. I'll try with other packages to see what happens.

On Mon, Jun 28, 2021, 03:25 jkrumbiegel @.***> wrote:

That is the normal behavior for combine, if you return an array per group they get concatenated into rows. You can use Ref around the expression to avoid that.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jkrumbiegel/DataFrameMacros.jl/issues/6#issuecomment-869396240, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3JEGCAVRFO2KSFVGYTTVAIUTANCNFSM47M2LZNQ .

jkrumbiegel commented 3 years ago

Might be, but that's DataFrames.jl behavior. I'll close this

juliohm commented 3 years ago

I think it is not about being a DataFrames.jl behavior. This is more like a general split-apply-combine behavior? I thought that any combine should guarantee a number of rows equal to the number of classes in the groupby. In user code, if one cannot know the number of rows beforehand that becomes an issue. Makes sense?

juliohm commented 3 years ago

What I am trying to say is that Ref behavior is a more natural default than broadcasting the groupby column.

jkrumbiegel commented 3 years ago

https://dataframes.juliadata.org/stable/man/split_apply_combine/

combine: does not put restrictions on number of rows returned, the order of rows is specified by the order of groups in GroupedDataFrame; it is typically used to compute summary statistics by group;

juliohm commented 3 years ago

I noticed that R produces the same results so I guess Query.jl is the exception to the rule.

Em seg., 28 de jun. de 2021 às 07:44, jkrumbiegel @.***> escreveu:

https://dataframes.juliadata.org/stable/man/split_apply_combine/

combine: does not put restrictions on number of rows returned, the order of rows is specified by the order of groups in GroupedDataFrame; it is typically used to compute summary statistics by group;

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jkrumbiegel/DataFrameMacros.jl/issues/6#issuecomment-869578150, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZQW3MZ63MXCAYDRECWWLTTVBHBNANCNFSM47M2LZNQ .