Closed odow closed 1 year ago
So the type of foods[:x]
would be Vector{VariableRef}
?
Yes. Here's something I've just been playing with:
model = Model(HiGHS.Optimizer)
@variable(model, x[foods.name] >= 0)
foods.x = x.data
@objective(model, Min, foods.cost' * foods.x,);
# A few ways of writing the same thing
for row in eachrow(limits)
@constraint(model, row.min <= sum(foods[!, row.name] .* foods.x) <= row.max)
end
for df in DataFrames.groupby(limits, :name)
@constraint(model, df.min[1] <= sum(foods[!, df.name[1]] .* foods.x) <= df.max[1])
end
for df in DataFrames.groupby(
DataFrames.leftjoin(
DataFrames.stack(foods, [:calories, :protein, :fat, :sodium]),
limits;
on = [:variable => :name],
),
:variable,
)
@constraint(model, df.min[1] <= sum(df.value .* df.x) <= df.max[1])
end
The groupby stuff makes it harder (for me) to read. I think we have a lot of mileage out of the existing {Dense,Sparse}AxisArray stuff, and a lot of functionality that comes for free in Julia.
But the practice of adding a column of a variables to a DataFrame is a useful tip.
I wonder if there are other models with more complicated data frames.
The groupby stuff makes it harder (for me) to read.
I agree; the first one (for row in eachrow(limits)...
) is the easiest and quite a nice technique.
I wonder if there are other models with more complicated data frames.
As far as I can tell, AnyMOD takes this idea to the limit:
To increase performance, AnyMOD stores variables within DataFrames instead of using JuMPs native containers.
FYI, I always enjoy re-reading this paper by Robert 4er Database structures for mathematical programming models describing the correspondence between variables and tabular data.
(Also, love the screenshots from his 1990's Mac user interface.)
As far as I can tell, AnyMOD
cc @leonardgoeke: any thoughts on ways JuMP could improve re dataframes?
Overall, I’m very happy with the options JuMP and DataFrames are providing already.
To make their combination more accessible, you could facilitate the conversion of containers. In the example code below creating and storing a variable in a DataFrame is significantly more complex than using the JuMP containers. This example is still simplified, because it is not sparse, and I would rather map the content of a and b to integers and use those in the DataFrame to improve the performance of join or groupby.
using JuMP, DataFrames
a = ["high","low"]
b = ["red","blue"]
model = Model()
@variable(model, x[a, b])
var = JuMP.build_variable(error, VariableInfo(false, NaN, false, NaN, false, NaN, false, NaN, false, false))
df = DataFrame([(a = z[1], b = z[2], var = JuMP.add_variable(model, var,"x[" * z[1] * "," * z[2] * "]")) for z in vcat(Iterators.product(a,b)...)])
Also, I miss an in-place multiplication. In the process of combining variables to expressions and ultimately creating constraints, I can use _add_toexpression! to sum efficiently but I’m not aware of a similar option for multiplication.
df = DataFrame(a = map(x -> 1.0* JuMP.add_variable(model, var,x),["a1","a2"]), b = map(x -> JuMP.add_variable(model, var,x),["b1","b2"]))
@time add_to_expression!.(df[!,:a], df[!,:b])
@time df[!,:a] = df[!,:a] .+ df[!,:b]
@time df[!,:b] = df[!,:b] * 2.0
I'd write your first example differently:
julia> using JuMP, DataFrames
julia> a = ["high","low"]
2-element Vector{String}:
"high"
"low"
julia> b = ["red","blue"]
2-element Vector{String}:
"red"
"blue"
julia> # Option 1
model = Model();
julia> df = DataFrame(
vec([
(a = i, b = j, var = @variable(model, base_name = "x[$i,$j]")) for
(i, j) in Iterators.product(a,b)
])
)
4×3 DataFrame
Row │ a b var
│ String String Variable…
─────┼──────────────────────────────
1 │ high red x[high,red]
2 │ low red x[low,red]
3 │ high blue x[high,blue]
4 │ low blue x[low,blue]
julia> # Option 2
model = Model();
julia> @variable(model, x[a, b])
2-dimensional DenseAxisArray{VariableRef,2,...} with index sets:
Dimension 1, ["high", "low"]
Dimension 2, ["red", "blue"]
And data, a 2×2 Matrix{VariableRef}:
x[high,red] x[high,blue]
x[low,red] x[low,blue]
julia> df = DataFrame(
vec([(a = i, b = j, var = x[i, j]) for (i, j) in Iterators.product(a,b)])
)
4×3 DataFrame
Row │ a b var
│ String String Variable…
─────┼──────────────────────────────
1 │ high red x[high,red]
2 │ low red x[low,red]
3 │ high blue x[high,blue]
4 │ low blue x[low,blue]
julia> # Option 3
model = Model();
julia> @variable(model, x[a, b])
2-dimensional DenseAxisArray{VariableRef,2,...} with index sets:
Dimension 1, ["high", "low"]
Dimension 2, ["red", "blue"]
And data, a 2×2 Matrix{VariableRef}:
x[high,red] x[high,blue]
x[low,red] x[low,blue]
julia> df = DataFrame(
vec([
(i = i, j = j, ai = ai, bj = bj, var = x[ai, bj]) for
((i, ai), (j, bj)) in Iterators.product(enumerate(a),enumerate(b))
])
)
4×5 DataFrame
Row │ i j ai bj var
│ Int64 Int64 String String Variable…
─────┼────────────────────────────────────────────
1 │ 1 1 high red x[high,red]
2 │ 2 1 low red x[low,red]
3 │ 1 2 high blue x[high,blue]
4 │ 2 2 low blue x[low,blue]
For the doubling, you can use:
julia> model = Model();
julia> @variable(model, x)
x
julia> aff_expr = 2 * x + 1
2 x + 1
julia> map_coefficients_inplace!(x -> 2x, aff_expr)
4 x + 2
Is there anything actionable here? I think the conclusion is that JuMP already provides the necessary functionality to work with DataFrames.jl.
I'm going to close this. If anyone has any further suggestions, please comment below and I will re-open the issue.
Hello, this is not so much a suggestion as a question, so if not appropriate in this closed issue please redirect me, thanks!
@jd-foster I am wondering if you know of any code implementations of the method described in the Fourer paper, especially related to generating synthetic data for the "hierarchical" or "relational" database schemas (much more interested in the relational schema, at least for now). I'd like to test some ideas I have regarding in memory DBs and JuMP but I'm not familiar with steel production so I really do not know how to come up with some synthetic data with parameters that resembles what he mentioned in the article.
Please ask these types of questions on https://discourse.julialang.org/c/domain/opt/13.
For SQLite examples, see:
https://jump.dev/JuMP.jl/stable/tutorials/linear/multi/ https://jump.dev/JuMP.jl/stable/tutorials/linear/multi_commodity_network/
I'm currently listening to @simonbowly talk about his cool work integrating gurobipy and pandas: https://github.com/Gurobi/gurobipy-pandas
I don't know if we can make it easy for JuMP to work with DataFrames without an extension package (although things might just work and it just requires some documentation), but it'd be nice to improve this: https://github.com/jump-dev/JuMP.jl/blob/451740ada26f634edb09cc5ec19b2c4d48031cfe/docs/src/tutorials/linear/diet.jl#L102-L106 by adding
x
as a column to the data frame instead of a separate variable.At minimum, it'd allow
but potentially some variant of
You could also imagine building constraints/expressions via split-apply-combine. I'd imagine there are lots of models where the variables correspond to a row in a dataframe.