Open light-weaver opened 11 months ago
The dataclasses for problem formulation should only contain information related to the formulation. No additional information (such as optimization results) should be stored here.
Related to #68.
I am currently looking into Yan's work and I plan to draft some examples based on it.
We should definitely stick to a neutral format for problem formulation going forward. Previously, a lot of the format relied on Numpy expressions, which are not trivial to convert into a general format. Panda and Polars expression are much more general, and therefore easier to manipulate. They are basically strings, which we can then convert to whatever format we need when passing them to a solver, or storing/retrieving the problems into/from a database.
The MathJSON format is also much easier to scale: we may add new entries to the format without causing backwards compatibility issues. I believe Yan's work offers a good base to build on. Something we must be very careful about is documenting this format we have in enough detail, and provide enough examples so that users can easily familiarize with our format.
Edit: flowchart updated.
I have updated the pydantic schema for representing problems in DESDEO. It can be viewed here A relational map of the current schema is visualized below:
I have also been thinking of the logic of parsing the JSON representation into an expression that can be evaluated. The flow of the logic has been described here. The image illustrating the flow as also shown in this post below.
At least the schema should cover analytical problems, but the question of surrogate-based problems remains open. In any case, we can add to the schema going forward without having to break any previous functionalities.
@light-weaver I updated the parser logic. Do you think ScalarizationFunction
in the problem schema should also have a symbol? Initially, I assumed this would not be needed (we usually do not re-use scalarization function values in other functions), but now that I think of it, this could be useful in, e.g., PIS-based methods?
Another question is, how do we define constraint functions? Do we define them as such that when they evaluate to a positive value, then the constraint is respected, and when they evaluate to a negative value, then the constraint is broken? E.g., x_1 <= 5
is expressed as 5 - x_1
.
Edit: after some internal discussion, we decided that constraints will be required to be in the format g(x) <= 0
or h(x) = 0
. In the Problem
schema, we will store only the g(x)
or h(x)
representing the constraint's function expression.
What is the current behavior? Problems are formulated using the MOProblem class.
Describe the solution you'd like Problem formulations should be represented as JSON objects. They can be read into Python as Dataclasses and stored in databases without changes. This requires analytical formulations of objectives to be stored as MathJSON objects instead of Numpy expressions. The MathJSON objects can be converted to Polars expression for evaluation with currently implemented methods. Alternatively, we can implement other converters that convert the problem formulation to, for example, numpy/pandas expressions, PuLP expressions, Gurobipy expression. We can even convert the MathJSON objects to industry-standard file formats (only for single objective optimization. Yan's work is the first step towards this idea.
What is the motivation/use case for changing the behavior? Currently, arbitrary python objects have to be stored into the database. This is bad behaviour and prevents complicated use cases such as updating/changing problem formulation.
Additional context Insight into how to handle surrogate modelling, external simulators, arbitrary binaries, and scenario based optimization needs further discussion.