statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.09k stars 2.88k forks source link

More informative doctrings in from_formula models #1232

Open vincentarelbundock opened 10 years ago

vincentarelbundock commented 10 years ago

When a model requires specification of multiple arguments, it is difficult to know which ones to input when looking at the docstring for the formula-based models.

Compare the call structure of ``smf.gee''

Definition: smf.gee(cls, formula, data, subset=None, *args, **kwargs)

To the paramater definition of ``sm.GEE''

Parameters
----------
endog : array-like
    1d array of endogenous response values.
exog : array-like
    A nobs x k array where `nobs` is the number of
    observations and `k` is the number of regressors. An
    intercept is not included by default and should be added
    by the user. See `statsmodels.tools.add_constant`.
groups : array-like
    A 1d array of length `nobs` containing the cluster labels.
time : array-like
    A 2d array of time (or other index) values, used by some
    dependence structures to define similarity relationships among
    observations within a cluster.
family : family class instance
    The default is Gaussian.  To specify the binomial
    distribution family = sm.family.Binomial(). Each family can 
    take a link instance as an argument.  See 
    statsmodels.family.family for more information.
covstruct : CovStruct class instance
    The default is Independence.  To specify an exchangeable
    structure covstruct = sm.covstruct.Exchangeable().  See 
    statsmodels.covstruct.covstruct for more information.
offset : array-like
    An offset to be included in the fit.  If provided, must be
    an array whose length is the number of rows in exog.
constraint : (ndarray, ndarray)
   If provided, the constraint is a tuple (L, R) such that the 
   model parameters are estimated under the constraint L * 
   param = R, where L is a q x p matrix and R is a
   q-dimensional vector.  If constraint is provided, a score
   test is performed to compare the constrained model to the 
   unconstrained model.
missing : str 
    Available options are 'none', 'drop', and 'raise'. If 'none', no nan 
    checking is done. If 'drop', any observations with nans are dropped.
    If 'raise', an error is raised. Default is 'none.'
josef-pkt commented 10 years ago

looks like we need to add some docstring manipulation and templating to inject the args and kwds into the from_formula docstring.