spinjet / pdopt-code

Public Release Version of PDOPT
https://pdopt-code.readthedocs.io/en/main/
MIT License
3 stars 1 forks source link

Improve code of some classes and functions #2

Closed jbussemaker closed 7 months ago

jbussemaker commented 9 months ago

Suggestions to improve the DesignSpace object:

Probabilistic Exploration object:

Extendable Model:

Part of JOSS submission

spinjet commented 7 months ago

Dear @jbussemaker

I'm going through these suggestions, so I have a question.

Regarding the DesignSpace object, the original idea was to use the .csv files as main input because it would allow the user to setup multiple cases to be ran in parallel from the terminal, hence changing just those input files. I suppose instantiation would work in code for small problems. However, I am not sure how to express it in a OOP way... I reckon I'd need a setter function for each parameter?

jbussemaker commented 7 months ago

Good question @spinjet

So what I meant was: why is it needed to pass in a csv files? Of course, in some cases it can be a convenient interface, but for the functioning of the program it is not really needed. Actually, already during instantiation you're reading the csv files and converting them to another format, namely a set of Parameter, Objective and Constraint objects.

My suggestion would be to change the main input to the __init__ function to be these lists of parameters, objective and constraints directly, so that the user can conveniently use the DesignSpace object purely from within Python. Specifying function/class inputs/outputs as objects also makes the code self-documenting, as it becomes clear exactly what data to provide and in which format. It also would make unit testing easier, since it's all just Python.

If you then still want to keep the possibility for using the csv files as an input (because I guess it fits to your workflow, and would need less modification of the documentation), then you can do so using a class factory function:

class DesignSpace:

    def __init__(params, obj, con, ...):
        pass  # Initialize the class using params, obj, con objects

    @classmethod
    def from_csv(cls, csv_parameters, csv_responses):
        # Read the CSV files and convert them to params, obj, con objects
        params = ...
        obj = ...
        con = ...

        # Then create the class
        return cls(params, obj, con)

# Usage example:
design_space = DesignSpace.from_csv(...)

Like this you could also add other factory functions if the need for that arises in the future (for example to load from a pickle-serialized DesignSpace or something like that).

These principles apply to several other objects as well if I remember correctly.

spinjet commented 7 months ago

Good question @spinjet

So what I meant was: why is it needed to pass in a csv files? Of course, in some cases it can be a convenient interface, but for the functioning of the program it is not really needed. Actually, already during instantiation you're reading the csv files and converting them to another format, namely a set of Parameter, Objective and Constraint objects.

My suggestion would be to change the main input to the __init__ function to be these lists of parameters, objective and constraints directly, so that the user can conveniently use the DesignSpace object purely from within Python. Specifying function/class inputs/outputs as objects also makes the code self-documenting, as it becomes clear exactly what data to provide and in which format. It also would make unit testing easier, since it's all just Python.

If you then still want to keep the possibility for using the csv files as an input (because I guess it fits to your workflow, and would need less modification of the documentation), then you can do so using a class factory function:

class DesignSpace:

    def __init__(params, obj, con, ...):
        pass  # Initialize the class using params, obj, con objects

    @classmethod
    def from_csv(cls, csv_parameters, csv_responses):
        # Read the CSV files and convert them to params, obj, con objects
        params = ...
        obj = ...
        con = ...

        # Then create the class
        return cls(params, obj, con)

# Usage example:
design_space = DesignSpace.from_csv(...)

Like this you could also add other factory functions if the need for that arises in the future (for example to load from a pickle-serialized DesignSpace or something like that).

These principles apply to several other objects as well if I remember correctly.

I see, it makes more sense now, thank you very much for the extensive explanation! It actually would fit better the idea of using this tool as a library and leaving the user more flexibility. I'll proceed to implement it.

In the meantime I've addressed some other suggestions (the init method of ProbabilisticExploration and hinting in ExtendableModel.run ).

spinjet commented 7 months ago

Added the ability to load DesignSpace from csv and pickle through classmethods, plus a saving method. I have created a branch with these changes: https://github.com/spinjet/pdopt-code/tree/dev-joss-review

I plan to also add a similar interface for saving the data in .csv files through Pandas, just to polish it.

spinjet commented 7 months ago

Dear @jbussemaker

Helper functions have now been fully implemented as per commit: https://github.com/spinjet/pdopt-code/commit/31f65670a6fa449833f975c7cba305950c74c2ad

Specifically:

Hope this is enough for the improvements.