henryzord / ardennes

An Estimation of Distribution Algorithm for Decision-Tree Induction.
5 stars 1 forks source link

Better localization when dealing with GMs #12

Closed henryzord closed 7 years ago

henryzord commented 7 years ago

The code currently relies heavily on the loc method from pandas.DataFrame for both sampling and updating procedures:

            grouped = self.weights.copy()  # type: pd.DataFrame
            for p in self.parents:
                grouped = grouped.loc[grouped[p] == session[p]]

however, the values in each cell are already known, since they follow a pre-determined distribution. Use this distribution for improving the speed of GM class.

henryzord commented 7 years ago

It is also unnecessary to perform this localization across several samplings, since a large portion of the population requires those probabilities at each time.

EDIT 1: fixed in b37367cfbaf7627f2a7d043b74469074ebea4bdc. EDIT 2: Actually, since it depends on the parents values, it is not entirely fixed as now.

henryzord commented 7 years ago

Fixed in 5d3018945a24498bc7945f528f0077444773f34e. Won't fix the localization of variables since pandas probably is better at doing this.