dtype category is not working with lightgbm (check the other libraries also)

When we are using this type of feature preprocessing 'dataset["Sex"] = dataset.Sex.astype("category")', the dataset will contain the string value, like 'male', but lightgbm will convert it to its int representation, like '1'.

When dtreeviz is using the prediction path to search the path through the tree for a sample, where will be a mismatch of values, like 'is "male" in [1]?'. This will cause the node_samples to have wrong samples and make the view() to fail.

You can reproduce the issue by using this dataset for training.

dataset_url = "https://raw.githubusercontent.com/parrt/dtreeviz/master/data/titanic/titanic.csv"
dataset = pd.read_csv(dataset_url)

dataset.fillna({"Age":dataset.Age.mean()}, inplace=True)
dataset["Sex"] = dataset.Sex.astype("category")#.cat.codes
dataset["Cabin"] = dataset.Cabin.astype("category").cat.codes
dataset.fillna({"Embarked":"?"}, inplace=True)
dataset["Embarked"] = dataset.Embarked.astype("category")#.cat.codes
print(dataset.dtypes)

parrt / dtreeviz

dtype category is not working with lightgbm (check the other libraries also) #267