jump-dev / JuMP.jl

Modeling language for Mathematical Optimization (linear, mixed-integer, conic, semidefinite, nonlinear)
http://jump.dev/JuMP.jl/
Other
2.24k stars 396 forks source link

Ideas from gurobipy-pandas #3212

Closed odow closed 1 year ago

odow commented 1 year ago

I'm currently listening to @simonbowly talk about his cool work integrating gurobipy and pandas: https://github.com/Gurobi/gurobipy-pandas

I don't know if we can make it easy for JuMP to work with DataFrames without an extension package (although things might just work and it just requires some documentation), but it'd be nice to improve this: https://github.com/jump-dev/JuMP.jl/blob/451740ada26f634edb09cc5ec19b2c4d48031cfe/docs/src/tutorials/linear/diet.jl#L102-L106 by adding x as a column to the data frame instead of a separate variable.

At minimum, it'd allow

 @objective( 
     model, 
     Min, 
     sum(food["cost"] * food["x"] for food in eachrow(foods)), 
 )

but potentially some variant of

@objective(model, Min, food["cost"]' * food["x"])
@objective(model, Min, sum(food.cost .* food.x)) 

You could also imagine building constraints/expressions via split-apply-combine. I'd imagine there are lots of models where the variables correspond to a row in a dataframe.

jd-foster commented 1 year ago

So the type of foods[:x] would be Vector{VariableRef}?

odow commented 1 year ago

Yes. Here's something I've just been playing with:

model = Model(HiGHS.Optimizer)

@variable(model, x[foods.name] >= 0)
foods.x = x.data

@objective(model, Min, foods.cost' * foods.x,);

# A few ways of writing the same thing

for row in eachrow(limits)
    @constraint(model, row.min <= sum(foods[!, row.name] .* foods.x) <= row.max)
end

for df in DataFrames.groupby(limits, :name)
    @constraint(model, df.min[1] <= sum(foods[!, df.name[1]] .* foods.x) <= df.max[1])
end

for df in DataFrames.groupby(
    DataFrames.leftjoin(
        DataFrames.stack(foods, [:calories, :protein, :fat, :sodium]),
        limits;
        on = [:variable => :name],
    ),
    :variable,
)
    @constraint(model, df.min[1] <= sum(df.value .* df.x) <= df.max[1])
end

The groupby stuff makes it harder (for me) to read. I think we have a lot of mileage out of the existing {Dense,Sparse}AxisArray stuff, and a lot of functionality that comes for free in Julia.

But the practice of adding a column of a variables to a DataFrame is a useful tip.

I wonder if there are other models with more complicated data frames.

jd-foster commented 1 year ago

The groupby stuff makes it harder (for me) to read.

I agree; the first one (for row in eachrow(limits)...) is the easiest and quite a nice technique.

I wonder if there are other models with more complicated data frames.

As far as I can tell, AnyMOD takes this idea to the limit:

To increase performance, AnyMOD stores variables within DataFrames instead of using JuMPs native containers.

jd-foster commented 1 year ago

FYI, I always enjoy re-reading this paper by Robert 4er Database structures for mathematical programming models describing the correspondence between variables and tabular data.

(Also, love the screenshots from his 1990's Mac user interface.)

odow commented 1 year ago

As far as I can tell, AnyMOD

cc @leonardgoeke: any thoughts on ways JuMP could improve re dataframes?

leonardgoeke commented 1 year ago

Overall, I’m very happy with the options JuMP and DataFrames are providing already.

To make their combination more accessible, you could facilitate the conversion of containers. In the example code below creating and storing a variable in a DataFrame is significantly more complex than using the JuMP containers. This example is still simplified, because it is not sparse, and I would rather map the content of a and b to integers and use those in the DataFrame to improve the performance of join or groupby.

using JuMP, DataFrames
a = ["high","low"]
b = ["red","blue"]
model = Model()
@variable(model, x[a, b])
var = JuMP.build_variable(error,  VariableInfo(false, NaN, false, NaN, false, NaN, false, NaN, false, false))
df = DataFrame([(a = z[1], b = z[2], var = JuMP.add_variable(model, var,"x[" * z[1] * "," * z[2] * "]"))  for z in vcat(Iterators.product(a,b)...)])

Also, I miss an in-place multiplication. In the process of combining variables to expressions and ultimately creating constraints, I can use _add_toexpression! to sum efficiently but I’m not aware of a similar option for multiplication.

df = DataFrame(a = map(x -> 1.0* JuMP.add_variable(model, var,x),["a1","a2"]), b = map(x -> JuMP.add_variable(model, var,x),["b1","b2"]))
@time add_to_expression!.(df[!,:a], df[!,:b])
@time df[!,:a] = df[!,:a] .+ df[!,:b]
@time df[!,:b] = df[!,:b] * 2.0
odow commented 1 year ago

I'd write your first example differently:

julia> using JuMP, DataFrames

julia> a = ["high","low"]
2-element Vector{String}:
 "high"
 "low"

julia> b = ["red","blue"]
2-element Vector{String}:
 "red"
 "blue"

julia> # Option 1
       model = Model();

julia> df = DataFrame(
           vec([
               (a = i, b = j, var = @variable(model, base_name = "x[$i,$j]")) for 
               (i, j) in Iterators.product(a,b)
           ])
       )
4×3 DataFrame
 Row │ a       b       var          
     │ String  String  Variable…    
─────┼──────────────────────────────
   1 │ high    red     x[high,red]
   2 │ low     red     x[low,red]
   3 │ high    blue    x[high,blue]
   4 │ low     blue    x[low,blue]

julia> # Option 2
       model = Model();

julia> @variable(model, x[a, b])
2-dimensional DenseAxisArray{VariableRef,2,...} with index sets:
    Dimension 1, ["high", "low"]
    Dimension 2, ["red", "blue"]
And data, a 2×2 Matrix{VariableRef}:
 x[high,red]  x[high,blue]
 x[low,red]   x[low,blue]

julia> df = DataFrame(
           vec([(a = i, b = j, var = x[i, j]) for (i, j) in Iterators.product(a,b)])
       )
4×3 DataFrame
 Row │ a       b       var          
     │ String  String  Variable…    
─────┼──────────────────────────────
   1 │ high    red     x[high,red]
   2 │ low     red     x[low,red]
   3 │ high    blue    x[high,blue]
   4 │ low     blue    x[low,blue]

julia> # Option 3
       model = Model();

julia> @variable(model, x[a, b])
2-dimensional DenseAxisArray{VariableRef,2,...} with index sets:
    Dimension 1, ["high", "low"]
    Dimension 2, ["red", "blue"]
And data, a 2×2 Matrix{VariableRef}:
 x[high,red]  x[high,blue]
 x[low,red]   x[low,blue]

julia> df = DataFrame(
           vec([
             (i = i, j = j, ai = ai, bj = bj, var = x[ai, bj]) for
             ((i, ai), (j, bj)) in Iterators.product(enumerate(a),enumerate(b))
           ])
       )
4×5 DataFrame
 Row │ i      j      ai      bj      var          
     │ Int64  Int64  String  String  Variable…    
─────┼────────────────────────────────────────────
   1 │     1      1  high    red     x[high,red]
   2 │     2      1  low     red     x[low,red]
   3 │     1      2  high    blue    x[high,blue]
   4 │     2      2  low     blue    x[low,blue]

For the doubling, you can use:

julia> model = Model();

julia> @variable(model, x)
x

julia> aff_expr = 2 * x + 1
2 x + 1

julia> map_coefficients_inplace!(x -> 2x, aff_expr)
4 x + 2
odow commented 1 year ago

Is there anything actionable here? I think the conclusion is that JuMP already provides the necessary functionality to work with DataFrames.jl.

odow commented 1 year ago

I'm going to close this. If anyone has any further suggestions, please comment below and I will re-open the issue.

slwu89 commented 11 months ago

Hello, this is not so much a suggestion as a question, so if not appropriate in this closed issue please redirect me, thanks!

@jd-foster I am wondering if you know of any code implementations of the method described in the Fourer paper, especially related to generating synthetic data for the "hierarchical" or "relational" database schemas (much more interested in the relational schema, at least for now). I'd like to test some ideas I have regarding in memory DBs and JuMP but I'm not familiar with steel production so I really do not know how to come up with some synthetic data with parameters that resembles what he mentioned in the article.

odow commented 11 months ago

Please ask these types of questions on https://discourse.julialang.org/c/domain/opt/13.

For SQLite examples, see:

https://jump.dev/JuMP.jl/stable/tutorials/linear/multi/ https://jump.dev/JuMP.jl/stable/tutorials/linear/multi_commodity_network/