py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
6.88k stars 916 forks source link

RuntimeWarning: divide by zero encountered in divide when using evaluate_causal_model #1213

Open newbietogitdotcom opened 1 week ago

newbietogitdotcom commented 1 week ago

Describe the bug My data has all the numeric columns and does not have any null, zero or infinite values. It also does not have any duplicate values but still i keep getting this error

"Evaluating causal mechanisms...: 50%|█████ | 10/20 [00:06<00:06, 1.55it/s]/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/dowhy/gcm/divergence.py:84: RuntimeWarning: divide by zero encountered in divide result = np.sum((d / n) np.log(nu / rho)) + np.log(m / (n - 1)) /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/dowhy/gcm/divergence.py:84: RuntimeWarning: divide by zero encountered in divide result = np.sum((d / n) np.log(nu / rho)) + np.log(m / (n - 1)) Evaluating causal mechanisms...: 100%|██████████| 20/20 [00:17<00:00, 1.16it/s] /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/dowhy/gcm/divergence.py:84: RuntimeWarning: divide by zero encountered in divide result = np.sum((d / n) * np.log(nu / rho)) + np.log(m / (n - 1))"

and also this error

""name": "RuntimeError", "message": "Got a non-finite KL divergence! This can happen if both data sets have overlapping elements. Since these are normally removed by this method, double check whether the arrays are numeric.",

Versions/3.10/lib/python3.10/concurrent/futures/_base.py:403\u001b[0m, in \u001b[0;36mFuture.__get_result\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 401\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_exception:\n\u001b[1;32m 402\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m--> 403\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_exception\n\u001b[1;32m 404\u001b[0m \u001b[39mfinally\u001b[39;00m:\n\u001b[1;32m 405\u001b[0m \u001b[39m# Break a reference cycle with the exception in self._exception\u001b[39;00m\n\u001b[1;32m 406\u001b[0m \u001b[39mself\u001b[39m \u001b[39m=\u001b[39m \u001b[39mNone\u001b[39;00m\n\n\u001b[0;31mRuntimeError\u001b[0m: Got a non-finite KL divergence! This can happen if both data sets have overlapping elements. Since these are normally removed by this method, double check whether the arrays are numeric.""

Steps to reproduce the behavior

This can also include a verbatim copy of outputs, or screenshots.

Expected behavior A clear and concise description of what you expected to happen.

Version information:

Additional context Add any other context about the problem here.

bloebp commented 1 week ago

Hi, does your data have columns with only a constant?

newbietogitdotcom commented 1 week ago

Hi @bloebp thank you for replying to my post.

No, it does not have any column with constant value. Please find some more information regarding my data below:

<class 'pandas.core.frame.DataFrame'> RangeIndex: 29 entries, 0 to 28 Data columns (total 22 columns):

Column Non-Null Count Dtype


0 Date 29 non-null dbdate 1 ET 29 non-null float64 2 EOT 29 non-null float64 3 DU 29 non-null float64 4 OD 29 non-null Int64
5 ONTD 29 non-null float64 6 ST 29 non-null Int64
7 UT 29 non-null Int64
8 OT 29 non-null Int64
9 TT 29 non-null Int64
10 THT 29 non-null Int64
11 SS 29 non-null float64 12 MPH 29 non-null float64 13 OA 29 non-null float64 14 LCA 29 non-null float64 15 OTP 29 non-null float64 16 DT 29 non-null float64 17 DST 29 non-null Int64
18 PM 29 non-null float64 19 BC 29 non-null float64 20 IC 29 non-null float64 21 TIP 29 non-null float64 dtypes: Int64(7), dbdate(1), float64(14) memory usage: 5.3 KB

and below are count of unique values per column

Date 29 ET 29 EOT 29 DU 23 OD 29 ONTD 7 ST 29 UT 28 OT 29 TT 29 THT 29 SS 3 MPH 25 OA 3 LCA 29 OTP 29 DT 27 DST 10 PM 16 BC 29 IC 29 TIP 29 dtype: int64

bloebp commented 1 week ago

Ok interesting, is there any chance you can provide some artificially generated data that reproduces this issue? I can take a closer look then.