Current dataframes are wrappers around various different containers (dictionaries, Pandas dataframes). This is mainly done to attach column analysis metadata to the columns. I think this is poor software design. The column analysis metadata should be placed inside the modules that utilize the dataframes.
For example, if we wish to calculate BLUEs from an experiment, we should create a BreedingValueProtocol that contains information as to which columns to use and how to analyze them.
This is a whole lot cleaner than writing a dataframe wrapper with analysis metadata. The interface is intuitive and eliminates the need for wrapper classes.
I recommend replacing the wrapper classes with a Pandas DataFrame. Vaex DataFrames are also a good option, but this is less well used. Pandas offers a stable, well documented API that many Python users are familiar with.
Current dataframes are wrappers around various different containers (dictionaries, Pandas dataframes). This is mainly done to attach column analysis metadata to the columns. I think this is poor software design. The column analysis metadata should be placed inside the modules that utilize the dataframes.
For example, if we wish to calculate BLUEs from an experiment, we should create a BreedingValueProtocol that contains information as to which columns to use and how to analyze them.
This is a whole lot cleaner than writing a dataframe wrapper with analysis metadata. The interface is intuitive and eliminates the need for wrapper classes.
I recommend replacing the wrapper classes with a Pandas DataFrame. Vaex DataFrames are also a good option, but this is less well used. Pandas offers a stable, well documented API that many Python users are familiar with.