sustainable-processes / summit

Optimising chemical reactions using machine learning
https://gosummit.readthedocs.io/en/latest/
MIT License
116 stars 24 forks source link

Categorical_Method LHS Input Encoding Error #265

Open HiddenBao opened 11 months ago

HiddenBao commented 11 months ago

Description

I am currently testing a future experimental setup condition in which I am trying to create initial experiments using LHS as a strategy whilst containing categorical and continuous data. When running the code I am getting an area when generating the experimental conditions. I am working within JupyterLab, I have removed some of the names from the file path but everything is the same.

What I Did

import cython
import summit
from summit.benchmarks import ExperimentalEmulator
from summit.domain import *
from summit.utils.dataset import DataSet
from summit.strategies import SOBO, MultitoSingleObjective, LHS
import numpy as np
import pandas as pd
import pkg_resources
import pathlib

DATA_PATH = pathlib.Path("F:/Python Programs/NKData")
input_df = pd.read_csv(DATA_PATH / 'BoundariesV2.csv')

domain = Domain()
for idx, row in input_df.iterrows():
    name = row[0]  
    description = row[5]  
    data_type = row['Type']

    if data_type == 'Categorical':
        levels = row[2].split(',')  

        domain += CategoricalVariable(
            name=name,
            description=description,
            levels=levels
        )
    elif data_type == 'Continuous':
        bounds = [row[3], row[4]]

        domain += ContinuousVariable(
            name=name,
            description=description,
            bounds=bounds
        )
    elif data_type == 'Objective':
        bounds = [row[3], row[4]]
        maximize = row[6]

        domain += ContinuousVariable(
            name=name,
            description=description,
            bounds=bounds,
            is_objective=True,
            maximize=maximize
        )

domain

categorical_method: str = "one-hot"
StartStrat = LHS(domain, random_state = np.random.RandomState(808), categorical_method=categorical_method)
StartExp = StartStrat.suggest_experiments(10)
StartExp

Output

Name | Type | Description | Values -- | -- | -- | -- Temperature | continuous, input | Reaction temperature in degrees Celsius (ºC) | [40.0,80.0] Catalyst_Amount | continuous, input | Catalyst amounts in molar equivalents (Equiv.) | [0.01,1.0] Starting_Reagent | continuous, input | 2-Methylimidozole amounts in molar equivalents (Equiv.) | [1.1,2.0] Solvent | continuous, input | Solvent amount in milliliters (mL) | [0.1,0.35] Time | continuous, input | Duration of reaction in hours (hr) | [2.0,24.0] Base | continuous, input | Base amount in molar equivalents (Equiv.) | [1.0,5.0] Catalyst_Type | categorical, input | Catalyst Types | 3 levels Main_Product | continuous, maximize objective | LCAP of Main Product | [0.0,1.0] Main_Impurity | continuous, minimize objective | LCAP of Main Impurity | [0.0,1.0]

Error I get from running the very last cell

AttributeError Traceback (most recent call last) Cell In[4], line 4 2 categorical_method: str = "one-hot" 3 StartStrat = LHS(domain, random_state = np.random.RandomState(808), categorical_method=categorical_method) ----> 4 StartExp = StartStrat.suggest_experiments(10) 5 StartExp

File ~\AppData\Roaming\Python\Python39\site-packages\summit\strategies\random.py:286, in LHS.suggest_experiments(self, num_experiments, criterion, exclude, **kwargs) 284 design = DataSet.from_df(design) 285 design[("strategy", "METADATA")] = "LHS" --> 286 return self.transform.un_transform( 287 design, categorical_method=self.categorical_method 288 )

File ~\AppData\Roaming\Python\Python39\site-packages\summit\strategies\base.py:324, in Transform.un_transform(self, ds, **kwargs) 318 # Categorical variables using one-hot encoding 319 elif ( 320 isinstance(variable, CategoricalVariable) 321 and categorical_method == "one-hot" 322 ): 323 # Get one-hot encoder --> 324 enc = self.encoders[variable.name] 326 # Get array to be transformed 327 one_hotnames = [f"{variable.name}{l}" for l in variable.levels]

AttributeError: 'Transform' object has no attribute 'encoders'

I apologise for any formatting issues I am new to this but I would greatly appreciate any help or advice for workarounds. Thank you for providing this library it is amazing.

marcosfelt commented 2 months ago

I am so sorry that I did not see this. If it is still relevant for me to take a look, please respond @HiddenBao

I will take a look later this week.

HiddenBao commented 2 months ago

No worries! I briefly went back and redid the code again without requiring the .csv file to see if it was the issue however I get the same error.

What I Did

import summit
from summit.benchmarks import ExperimentalEmulator
from summit.domain import *
from summit.utils.dataset import DataSet
from summit.strategies import LHS, MTBO

domain = Domain()

domain += CategoricalVariable(
    name = "Catalyst", 
    description = "Test",
    levels = [
        "A",
        "B",
        "C",
        "D"
    ],
)

domain += ContinuousVariable(
    name = "Temperature",
    description = "Test",
    bounds = [40, 80]
)

domain += ContinuousVariable(
    name = "Catalyst_Amount",
    description = "Test",
    bounds = [0.01, 1.0]
)

domain += ContinuousVariable(
    name = "Reagent",
    description = "Test",
    bounds = [1.1, 2.0]
)

domain += ContinuousVariable(
    name = "Solvent",
    description = "Test",
    bounds = [0.1, 0.35]
)

domain += ContinuousVariable(
    name = "Time",
    description = "Test",
    bounds = [2.0, 24]
)

domain += ContinuousVariable(
    name = "Base",
    description = "Test",
    bounds = [1.0, 5.0]
)

domain += ContinuousVariable(
    name = "Main_Product",
    description = "Test",
    bounds = [0, 1],
    is_objective = True,
    maximize = True
)

domain += ContinuousVariable(
    name = "Main_Impurity",
    description = "Test",
    bounds = [0, 1],
    is_objective = True,
    maximize = False
)
domain

The domain gets created perfectly fine to my knowledge, I then tried running the transform in the LHS and MTBO strategy using the following.

strategy = LHS(domain,
                 random_state = np.random.RandomState(808),
                 categorical_method="one-hot"
                )
StartExp = StartStrat.suggest_experiments(10)
StartExp
strategy = MTBO(domain,
                 random_state = np.random.RandomState(808),
                 categorical_method="one-hot"
                )
StartExp = StartStrat.suggest_experiments(10)
StartExp

And both return the same error still.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[15], line 5
      1 strategy = MTBO(domain,
      2                  random_state = np.random.RandomState(808),
      3                  categorical_method="one-hot"
      4                 )
----> 5 StartExp = StartStrat.suggest_experiments(10)
      6 StartExp

File ~\AppData\Roaming\Python\Python39\site-packages\summit\strategies\random.py:286, in LHS.suggest_experiments(self, num_experiments, criterion, exclude, **kwargs)
    284 design = DataSet.from_df(design)
    285 design[("strategy", "METADATA")] = "LHS"
--> 286 return self.transform.un_transform(
    287     design, categorical_method=self.categorical_method
    288 )

File ~\AppData\Roaming\Python\Python39\site-packages\summit\strategies\base.py:324, in Transform.un_transform(self, ds, **kwargs)
    318 # Categorical variables using one-hot encoding
    319 elif (
    320     isinstance(variable, CategoricalVariable)
    321     and categorical_method == "one-hot"
    322 ):
    323     # Get one-hot encoder
--> 324     enc = self.encoders[variable.name]
    326     # Get array to be transformed
    327     one_hot_names = [f"{variable.name}_{l}" for l in variable.levels]

AttributeError: 'Transform' object has no attribute 'encoders'

Thank you again for creating this library, it is great and extremely useful. I greatly appreciate all the work that has been done for it.

marcosfelt commented 2 months ago

This definitely seems like a bug - I will take a look this weekend!

marcosfelt commented 2 months ago

Can confirm that I can reproduce the bug. Going to look into a fix now