pydata / patsy

Describing statistical models in Python using symbolic formulas
Other
956 stars 104 forks source link

When passing DataFrame, dmatrices returns design matrix with zero rows using standardize in formula #166

Open rmwenzel opened 3 years ago

rmwenzel commented 3 years ago

When passing a pandas.DataFrame, dmatrices is returning a design matrix with no rows in at least two cases I could find.

Here's some minimal examples.

Case 1: All column values are the same

import pandas as pd
from patsy import dmatrices

df = pd.DataFrame({'a': [1, 1, 1], 'b': [0, 1, 0]})
formula = 'b ~ standardize(a)'
dmatrices(formula, data=df)

give

DesignMatrix with shape (0, 2)
  Intercept  standardize(a)
  Terms:
    'Intercept' (column 0)
    'standardize(a)' (column 1)

Case 2. Column values are different but contain np.nan

import pandas as pd
import numpy as np
from patsy import dmatrices

df = pd.DataFrame({'a': [2, 3, np.nan], 'b': [0, 1, 0]})
formula = 'b ~ standardize(a)'
dmatrices(formula, data=df)

gives the same

DesignMatrix with shape (0, 2)
  Intercept  standardize(a)
  Terms:
    'Intercept' (column 0)
    'standardize(a)' (column 1)

patsy version is the latest on conda, 0.5.1