python-qds / qdscreen

Quasi-determinism screening for fast Bayesian Network Structure Learning (from T.Rahier's PhD thesis, 2018)
https://python-qds.github.io/qdscreen/
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

`IndexError` when a Nan is present in the dataframe #28

Closed smarie closed 2 years ago

smarie commented 2 years ago

When NaNs are present in the dataframe, an IndexError can occur:

df = pd.DataFrame([
    ["A", "B"],
    ["A", "B"],
    ["N", np.nan],
])

qd_forest = qd_screen(df)
qd_forest.fit_selector_model(df)

yields

  File "C:\_dev\python_ws\_Libs_OpenSource\qdscreen\qdscreen\selector.py", line 112, in <lambda>
    levels_mapping_df = pd.DataFrame(X_ar[:, (parent, child)]).groupby(0).agg(lambda x: x.value_counts().index[0])
  File "C:\Miniconda3\envs\tools_py37\lib\site-packages\pandas\core\indexes\base.py", line 4297, in __getitem__
    return getitem(key)
IndexError: index 0 is out of bounds for axis 0 with size 0