rzshrote / pybrops

Python Breeding Optimizer and Simulator: A Python library for simulating and optimizing breeding pipelines.
https://rzshrote.github.io/pybrops/
MIT License
2 stars 1 forks source link

Improve DataFrames by removing them #64

Closed rzshrote closed 1 year ago

rzshrote commented 2 years ago

Current dataframes are wrappers around various different containers (dictionaries, Pandas dataframes). This is mainly done to attach column analysis metadata to the columns. I think this is poor software design. The column analysis metadata should be placed inside the modules that utilize the dataframes.

For example, if we wish to calculate BLUEs from an experiment, we should create a BreedingValueProtocol that contains information as to which columns to use and how to analyze them.

bvprot = MyBreedingValueProtocol(
    response = "yield",
    fixed = ["genotype"],
    random = ["environment","genotype:environment"]
)
bvmat = bvprot.estimate(df)

This is a whole lot cleaner than writing a dataframe wrapper with analysis metadata. The interface is intuitive and eliminates the need for wrapper classes.

I recommend replacing the wrapper classes with a Pandas DataFrame. Vaex DataFrames are also a good option, but this is less well used. Pandas offers a stable, well documented API that many Python users are familiar with.