jkrumbiegel / DataFrameMacros.jl

Macros that simplify working with DataFrames.jl
MIT License
61 stars 4 forks source link

Feature Request: In-place updating of data frames #16

Closed jeremiahpslewis closed 1 year ago

jeremiahpslewis commented 2 years ago

Haven't found a good way of using DataFramesMacros to express data updates, but think it may be an unmet need:

using DataFrameMacros
using DataFrames
using Chain

df = DataFrame(
    :a => [1, 2, 3, 4, 5],
    :b => [1, 2, 3, 4, 5],
    :c => [1, 2, 3, 4, 5],
)

# Typical DataFrames.jl syntax
df[df.a .^ 2  .== 4, :b] .= 10

show(df)

# Requires cases PR #11
@chain df begin
    @cases(:b = @cases(:a^2 == 4 => 10, :b))
end

# Example of proposed update macro
@chain df begin
    @transform(:z = :a^2)
    @subset(:z == 4)
    @update(:b = 10)
end

Where @update modifies the columns only for the rows passed to it.

I imagine there are lots of reasons this quickly becomes an anti-pattern (@groupby, etc), so feel free to reject and close immediately, but this is the only remaining use case where I can't figure out how to stay within the DataFrameMacros paradigm and its world of relative dataframe-manipulation ease and parsimony. 😀

jkrumbiegel commented 2 years ago

I think there was something in the works at DataFrames.jl which allowed data updates with transform! on a subset view, but I haven't had time yet to take a look at the implementation, or if something can be improved on DataFrameMacro's side to help. The @update macro is a good idea, but maybe it can be done with only DataFrames tools, just improved syntax.

jeremiahpslewis commented 2 years ago

Here's the link to transform! for future reference: https://dataframes.juliadata.org/stable/lib/functions/#DataFrames.transform!, added in v1.3. Looks like it roughly covers the idea, with an elegant implementation for filtered out rows: missing values. Key decision may be whether to allow @m missing flag to toggle whether rows are filled in with missing or with original values.

jkrumbiegel commented 1 year ago

This actually has been implemented for a while now with @transform!(df, @subset(... and I just didn't notice I should close this issue.