munichpavel / fake-data-for-learning

Sample interesting fake data for machine and human learning
https://munichpavel.github.io/fake-data-for-learning
MIT License
7 stars 0 forks source link

Fix label encoding ordering hack #8

Closed munichpavel closed 4 years ago

munichpavel commented 5 years ago

This has become a bug as I want to sample from a bayesian network conditioned on some values, as I would have to enter the values as a dict with the hacked value names (e.g. "aMale", "bFemale") to override the default sklearn / bumpy ordering

munichpavel commented 5 years ago

https://stackoverflow.com/questions/51308994/python-sklearn-determine-the-encoding-order-of-labelencoder

munichpavel commented 5 years ago

Fixed using above SO answer at https://github.com/munichpavel/fake-data-for-learning/commit/b43240bf7b8b0c9df31ea4765587e4c7227c7c97

No need for internal representation or restriction on underscores in values, removed.

munichpavel commented 4 years ago

I should have known better than to trust an SO solution with 1 vote.

Will just live with sklearn behavior of sorting strings in alphabetical (lexicographical) order.