Closed marisakamozz closed 4 years ago
Thanks for reporting this. Purely on the basis of the example you are giving, a warning would be expected the way I see it. There are no pairs of A and B that do not have missing values. Filling the columns with 'NA' values changes the semantics of the data. It might be that you encountered an example where this isn't the case. In that case, please let us know.
Providing a warning is not perfect, we should look for a way to improve handling this case.
Warning messages aren't the only problem. An additional problem is that no cramers correlation coefficients will be output, including those that can be calculated correctly.
df = pd.DataFrame({
'A': [1, 2, None, None],
'B': [None, None, 8, 9],
'C': [3, 4, None, None]
})
In the above example, the cramers correlation coefficient of A and C can be calculated, but even that will not be output.
There are multiple strategies to deal with missing values in correlations. For the Cramer's V corrected stat it currently drops the pairs of variables where there is at least one obvervation that is missing for both variables. This choice should be documented.
Moving forward, let's implement one or multiple other strategies (such as complete case analysis). At least one feature that is missing is the propagation of missing values in the correlation matrix.
Reference:
Update: in the next release, columns are no longer dropped when one correlation coefficient could not be calculated. Instead, it is included in the plot. Note that this does not fully resolve this issue yet.
Stale issue
I still have the same issue. using panda-profiling v3.0.0
pandas_profiling/model/correlations.py:152: UserWarning: There was an attempt to calculate the cramers correlation, but this failed.
To hide this warning, disable the calculation
(using `df.profile_report(correlations={"cramers": {"calculate": False}})`
If this is problematic for your use case, please report this as an issue:
https://github.com/pandas-profiling/pandas-profiling/issues
(include the error message: 'No data; `observed` has size 0.')
(include the error message: '{error}')"""
Same issue here
There was an attempt to generate the Count missing values diagrams, but this failed.
To hide this warning, disable the calculation
(using df.profile_report(missing_diagrams={"Count": False}
)
If this is problematic for your use case, please report this as an issue:
https://github.com/pandas-profiling/pandas-profiling/issues
(include the error message: 'The number of FixedLocator locations (7), usually from a call to set_ticks, does not match the number of ticklabels (60).')
Description:
When a cross tabulation table cannot be created, calculation of the cramers correlation fails, and no cramers correlation coefficient is displayed at all, including those that can be calculated correctly.
To Reproduce:
Warning message:
Version information:
python==3.7.7 pandas==0.25.3 pandas-profiling==2.5.0
Work around:
Filling missing values with some values.