zackxconti / bnmetamodel_gh

Repo for bnmetamodel lib version for Lab Mouse Grasshopper plug-in.
1 stars 2 forks source link

Sort out two `getBinRanges` functions #55

Open kallewesterling opened 11 months ago

kallewesterling commented 11 months ago

There is one BNdata.getBinRanges method and one function getBinRanges defined in Helper_functions.py. They produce the same result from what I can tell but do it slightly differently. Some refactoring could be done here to make the script leaner.

Code

BNdata.getBinRanges

def getBinRanges (self, binTypeDict, numBinsDict):
    trainingDfDiscterizedRanges = []
    trainingDfDiscterizedRangesDict = {}

    # loop through variables in trainingDf (columns) to discretize into ranges according to trainingDf
    for varName in list(self.data):
        # if true, discretise variable i, using percentiles, if false, discretise using equal bins
        if binTypeDict[varName] == 'percentile':
            trainingDfDiscterizedRanges.append(percentile_bins(self.data[varName], numBinsDict.get(varName)))  # adds to a list
            trainingDfDiscterizedRangesDict[varName] = percentile_bins(self.data[varName], numBinsDict.get(varName))  # adds to a dictionary
        elif 'equal':
            trainingDfDiscterizedRanges.append(bins(max(self.data[varName]), min(self.data[varName]),numBinsDict.get(varName)))  # adds to a list
            trainingDfDiscterizedRangesDict[varName] = bins(max(self.data[varName]), min(self.data[varName]),numBinsDict.get(varName))  # adds to a dictionary

    # update class attribute, while you're at it
    self.bin_ranges = trainingDfDiscterizedRangesDict

    return trainingDfDiscterizedRangesDict

Helper_functions.getBinRanges

def getBinRanges (dataframe, binTypeDict, numBinsDict):
    trainingDfDiscterizedRanges = []
    trainingDfDiscterizedRangesDict = {}

    # loop through variables in trainingDf (columns) to discretize into ranges according to trainingDf

    # TODO #47: Refactor getBinRanges to no longer use names from dataframe but from an original list of BN nodes
    for varName in binTypeDict.keys():
        if binTypeDict[varName] == 'p':
            trainingDfDiscterizedRanges.append(percentile_bins(dataframe[varName], numBinsDict.get(varName)))  # adds to a list
            trainingDfDiscterizedRangesDict[varName] = percentile_bins(dataframe[varName], numBinsDict.get(varName))  # adds to a dictionary
        elif 'e':
            trainingDfDiscterizedRanges.append(bins(max(dataframe[varName]), min(dataframe[varName]),numBinsDict.get(varName)))  # adds to a list
            trainingDfDiscterizedRangesDict[varName] = bins(max(dataframe[varName]), min(dataframe[varName]),numBinsDict.get(varName))  # adds to a dictionary

        # TODO #48: Refactor getBinRanges to include a new option `auto(mlp)`

    return trainingDfDiscterizedRangesDict
kallewesterling commented 10 months ago

Also note that elif 'equal' and elif 'e' will always evaluate to True, see Python docs:

By default, an object is considered true unless its class defines either a bool() method that returns False or a len() method that returns zero, when called with the object.