python-qds / qdscreen

Quasi-determinism screening for fast Bayesian Network Structure Learning (from T.Rahier's PhD thesis, 2018)
https://python-qds.github.io/qdscreen/
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

`predict_qd` raises `ValueError: invalid literal for int() with base 10` #40

Closed smarie closed 1 year ago

smarie commented 1 year ago

This bug happens when one column starts with a nan, and then contains a string. Numpy vectorize is guessing from nan that the mapping operation creates a number, but then it fails when it hits a string.

smarie commented 1 year ago
df = pd.DataFrame({
    "foo": ["1", "2"],
    "bar": [np.nan, "B"]
})
qd_forest = qd_screen(df)
feat_selector = qd_forest.fit_selector_model(df)
only_important_features_df = feat_selector.remove_qd(df)
result = feat_selector.predict_qd(only_important_features_df)