serengil / chefboost

A Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4.5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting, Random Forest and Adaboost w/categorical features support for Python
https://www.youtube.com/watch?v=Z93qE5eb6eg&list=PLsS_1RYmYQQHp_xZObt76dpacY543GrJD&index=3
MIT License
456 stars 101 forks source link

'numpy.float32' object has no attribute 'is_integer' #15

Closed Gabomfim closed 2 years ago

Gabomfim commented 3 years ago

Tried to do the following on a dataset with float samples. (Running on Python 3.7)

configGBM = {'algorithm': 'C4.5', 'enableGBM': True, 'epochs': 7, 'learning_rate': 1, 'max_depth': 5}
modelGBM = chef.fit(train, config = configGBM)

Error Log:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/vk/rw3fbc110n3fsf_xhz6_r4m00000gn/T/ipykernel_67628/3037199772.py in <module>
      1 configGBM = {'algorithm': 'C4.5', 'enableGBM': True, 'epochs': 7, 'learning_rate': 1, 'max_depth': 5}
----> 2 modelGBM = chef.fit(train, config = configGBM)

/usr/local/lib/python3.7/site-packages/chefboost/Chefboost.py in fit(df, config, target_label, validation_df)
    190 
    191                 if df['Decision'].dtypes == 'object': #transform classification problem to regression
--> 192                         trees, alphas = gbm.classifier(df, config, header, dataset_features, validation_df = validation_df, process_id = process_id)
    193                         classification = True
    194 

/usr/local/lib/python3.7/site-packages/chefboost/tuning/gbm.py in classifier(df, config, header, dataset_features, validation_df, process_id)
    270                                 instance['P_'+str(j)] = probabilities[j]
    271 
--> 272                         worksheet.loc[row] = instance
    273 
    274                 for i in range(0, len(classes)):

/usr/local/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    721 
    722         iloc = self if self.name == "iloc" else self.obj.iloc
--> 723         iloc._setitem_with_indexer(indexer, value, self.name)
    724 
    725     def _validate_key(self, key, axis: int):

/usr/local/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value, name)
   1728         if take_split_path:
   1729             # We have to operate column-wise
-> 1730             self._setitem_with_indexer_split_path(indexer, value, name)
   1731         else:
   1732             self._setitem_single_block(indexer, value, name)

/usr/local/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_with_indexer_split_path(self, indexer, value, name)
   1795                 # We are setting multiple columns in a single row.
   1796                 for loc, v in zip(ilocs, value):
-> 1797                     self._setitem_single_column(loc, v, pi)
   1798 
   1799             elif len(ilocs) == 1 and com.is_null_slice(pi) and len(self.obj) == 0:

/usr/local/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_single_column(self, loc, value, plane_indexer)
   1918             # set the item, possibly having a dtype change
   1919             ser = ser.copy()
-> 1920             ser._mgr = ser._mgr.setitem(indexer=(pi,), value=value)
   1921             ser._maybe_update_cacher(clear=True)
   1922 

/usr/local/lib/python3.7/site-packages/pandas/core/internals/managers.py in setitem(self, indexer, value)
    353 
    354     def setitem(self: T, indexer, value) -> T:
--> 355         return self.apply("setitem", indexer=indexer, value=value)
    356 
    357     def putmask(self, mask, new, align: bool = True):

/usr/local/lib/python3.7/site-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
    325                     applied = b.apply(f, **kwargs)
    326                 else:
--> 327                     applied = getattr(b, f)(**kwargs)
    328             except (TypeError, NotImplementedError):
    329                 if not ignore_failures:

/usr/local/lib/python3.7/site-packages/pandas/core/internals/blocks.py in setitem(self, indexer, value)
    924         # coerce if block dtype can store value
    925         values = self.values
--> 926         if not self._can_hold_element(value):
    927             # current dtype cannot store value, coerce to common dtype
    928             return self.coerce_to_target_dtype(value).setitem(indexer, value)

/usr/local/lib/python3.7/site-packages/pandas/core/internals/blocks.py in _can_hold_element(self, element)
    620         """require the same dtype as ourselves"""
    621         element = extract_array(element, extract_numpy=True)
--> 622         return can_hold_element(self.values, element)
    623 
    624     @final

/usr/local/lib/python3.7/site-packages/pandas/core/dtypes/cast.py in can_hold_element(arr, element)
   2181         if tipo is not None:
   2182             if tipo.kind not in ["i", "u"]:
-> 2183                 if is_float(element) and element.is_integer():
   2184                     return True
   2185                 # Anything other than integer we cannot hold

AttributeError: 'numpy.float32' object has no attribute 'is_integer'
serengil commented 3 years ago

could you share your data set?

Gabomfim commented 3 years ago

Sorry for keeping you waiting.

I'm sharing with you my notebook with all the files, including the databases used (in the data file). I managed to fix the problem by importing the database as txt instead of csv.

allstroke.txt is the txt version of the healthcare-dataset-stroke-data.csv database. That did the fix.

We now import the database in this way: df = pd.read_csv("./data/allStroke.txt", index_col=0)

I don't have the old code with me now, but I can send it to you the next week if needed.

serengil commented 2 years ago

When I run this in my environment, it works well. I have Python 3.8.12, pandas==1.3.5. I recommend you to upgrade or downgrade to my environment level.

from chefboost import Chefboost as chef
import pandas as pd

df = pd.read_csv("healthcare-dataset-stroke-data.csv", index_col=0)

print(df.head())

configGBM = {'algorithm': 'C4.5', 'enableGBM': True, 'epochs': 7, 'learning_rate': 1, 'max_depth': 5, 'enableParallelism': False}
modelGBM = chef.fit(df = df, config = configGBM)

Output logs: (sefik) sefik@Sefiks-MacBook-Pro Desktop % python hello.py gender age hypertension heart_disease ever_married work_type Residence_type avg_glucose_level bmi smoking_status Decision id
9046 Male 67.0 0 1 Yes Private Urban 228.69 36.6 formerly smoked Yes 51676 Female 61.0 0 0 Yes Self-employed Rural 202.21 NaN never smoked Yes 31112 Male 80.0 0 1 Yes Private Rural 105.92 32.5 never smoked Yes 60182 Female 49.0 0 0 Yes Private Urban 171.23 34.4 smokes Yes 1665 Female 79.0 1 0 Yes Self-employed Rural 174.12 24.0 never smoked Yes Gradient Boosting Machines... Regression tree is going to be built... gradient boosting for classification Epoch 7. Accuracy: 82. Process: : 100%|█████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:42<00:00, 6.02s/it] The best accuracy got in 6 epoch with the score 82.78210116731518

finished in 42.12960386276245 seconds

Evaluate train set

Accuracy: 82.00389105058366 % on 1028 instances Labels: ['Yes' 'No'] Confusion matrix: [[99, 35], [150, 744]] Precision: 73.8806 %, Recall: 39.759 %, F1: 51.6971 %