theislab / diffxpy

Differential expression analysis for single-cell RNA-seq data.
https://diffxpy.rtfd.io
BSD 3-Clause "New" or "Revised" License
179 stars 23 forks source link

Constraints doc string #189

Open Hrovatin opened 3 years ago

Hrovatin commented 3 years ago

In wald test it says that: "constraints_loc: Array with constraints in rows and model parameters in columns. Each constraint contains non-zero entries for the a of parameters that has to sum to zero. " However, when I try to parse constraints as they are within wald it actually seems that constraints should be params*constraints with columns summing to 0 on constrained parameters.

 np.unique(np.asarray(dmat_est_loc),axis=0)

array([[1., 0., 0., 0., 1., 1.],
       [1., 0., 0., 1., 0., 1.],
       [1., 0., 1., 0., 0., 0.],
       [1., 1., 0., 0., 0., 0.]])

dmat_est_loc.design_info.column_names

['Intercept',
 'rep1[T.1.0]',
 'rep1[T.2.0]',
 'rep2[T.1.0]',
 'rep2[T.2.0]',
 'treatment[T.1.0]']

constraints_loc=de.utils.constraint_matrix_from_string(
        dmat= dmat_est_loc,
        coef_names=dmat_est_loc.design_info.column_names,
        constraints = ["rep1[T.1.0] + rep1[T.2.0] = 0","rep2[T.1.0] + rep2[T.2.0] = 0"]
) 

constraints_loc

array([[ 1.,  0.,  0.,  0.],
       [ 0., -1.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0., -1.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])

# Constraints in rows
design_loc, design_loc_names, constraints_loc, term_names_loc = de.utils.constraint_system_from_star(

    dmat=dmat_est_loc,

    constraints=constraints_loc.T,

    return_type="patsy"

)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-118-2dcc36b42735> in <module>
----> 1 design_loc, design_loc_names, constraints_loc, term_names_loc = de.utils.constraint_system_from_star(
      2     dmat=dmat_est_loc,
      3     constraints=constraints_loc.T,
      4     return_type="patsy"
      5 )

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/diffxpy/testing/utils.py in constraint_system_from_star(dmat, sample_description, formula, as_numeric, constraints, return_type)
    264         as_categorical = True
    265 
--> 266     return glm.data.constraint_system_from_star(
    267         dmat=dmat,
    268         sample_description=sample_description,

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/batchglm/data.py in constraint_system_from_star(dmat, sample_description, formula, as_categorical, constraints, return_type)
    253             )
    254     else:
--> 255         if np.linalg.matrix_rank(np.matmul(dmat, cmat)) != cmat.shape[1]:
    256             raise ValueError(
    257                 "constrained design matrix is not full rank: %i %i" %

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 4 is different from 6)

# Constraints in columns
design_loc, design_loc_names, constraints_loc, term_names_loc = de.utils.constraint_system_from_star(

    dmat=dmat_est_loc,

    constraints=constraints_loc,

    return_type="patsy"

)

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-116-54a1124d090d> in <module>
----> 1 design_loc, design_loc_names, constraints_loc, term_names_loc = de.utils.constraint_system_from_star(
      2     dmat=dmat_est_loc,
      3     constraints=constraints_loc,
      4     return_type="patsy"
      5 )

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/diffxpy/testing/utils.py in constraint_system_from_star(dmat, sample_description, formula, as_numeric, constraints, return_type)
    264         as_categorical = True
    265 
--> 266     return glm.data.constraint_system_from_star(
    267         dmat=dmat,
    268         sample_description=sample_description,

~/miniconda3/envs/rpy2_3/lib/python3.8/site-packages/batchglm/data.py in constraint_system_from_star(dmat, sample_description, formula, as_categorical, constraints, return_type)
    259             )
    260 
--> 261     return dmat, coef_names, cmat, term_names
    262 
    263 

UnboundLocalError: local variable 'coef_names' referenced before assignment

# However, this works if I change it to
design_loc, design_loc_names, constraints_loc, term_names_loc = de.utils.constraint_system_from_star(
    dmat=pd.DataFrame(np.array(dmat_est_loc),columns=dmat_est_loc.design_info.column_names),
    constraints=constraints_loc,
    return_type="patsy"
)

design_loc_names

Index(['Intercept', 'rep1[T.1.0]', 'rep1[T.2.0]', 'rep2[T.1.0]', 'rep2[T.2.0]',
       'treatment[T.1.0]'],
      dtype='object')

np.unique(design_loc,axis=0)

array([[1., 0., 0., 0., 1., 1.],
       [1., 0., 0., 1., 0., 1.],
       [1., 0., 1., 0., 0., 0.],
       [1., 1., 0., 0., 0., 0.]])

constraints_loc

array([[ 1.,  0.,  0.,  0.],
       [ 0., -1.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0., -1.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])

term_names_loc

Can you clarify this in the doc string?

davidsebfischer commented 3 years ago

You can also start a PR onto dev just fixing this doc string!

Hrovatin commented 3 years ago

The same thing is written for multiple tests. I think it may need to be corrected for all of them, but not sure. - This may depend on one of the functions called within the test functions. Is there a way to unify doc strings in this case or must it be corrected/maintained one by one?

davidsebfischer commented 3 years ago

Thanks! You could try this: https://github.com/theislab/sfaira/blob/a713ab0a679e2b409ae841f3d35bebda758c413d/sfaira/data/base/dataset.py#L28 and https://github.com/theislab/sfaira/blob/a713ab0a679e2b409ae841f3d35bebda758c413d/sfaira/data/base/dataset.py#L416

Hrovatin commented 3 years ago

At the end I just corrected the individual doc strings to keep doc formatting consitent across the params of functions. Should be in #192