neurostuff / PyMARE

PyMARE: Python Meta-Analysis & Regression Engine
https://pymare.readthedocs.io
MIT License
51 stars 14 forks source link

Datasets do not check input array shapes/sizes #99

Closed tsalo closed 2 years ago

tsalo commented 2 years ago

To reproduce the problem, use the following code

from pymare import core, estimators

y = [
    [2, 4, 6],  # estimates for first study's three datasets
    [3, 2, 1],  # estimates for second study's three datasets
]
v = [
    [100, 100, 100],  # estimate variance for first study's three datasets
    [8, 4, 2],  # estimate variance for second study's three datasets
]
X = [  # all "dataset"s must have the same regressors
    [5, 9],  # regressors for first study
    [2, 8],  # regressors for second study
    [5, 5],  # regressors for imaginary third study
]

dataset = core.Dataset(y=y, v=v, X=X, X_names=["X1", "X7"], add_intercept=False)
est = estimators.WeightedLeastSquares().fit_dataset(dataset)
results = est.summary()

Here's the traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-b48664cc0a71> in <module>
     16 
     17 dataset = core.Dataset(y=y, v=v, X=X, X_names=["X1", "X7"], add_intercept=False)
---> 18 est = estimators.WeightedLeastSquares().fit_dataset(dataset)
     19 results = est.summary()

~/Documents/tsalo/PyMARE/pymare/estimators/estimators.py in fit_dataset(self, dataset, *args, **kwargs)
     97 
     98         all_kwargs.update(kwargs)
---> 99         self.fit(*args, **all_kwargs)
    100         self.dataset_ = dataset
    101 

~/Documents/tsalo/PyMARE/pymare/estimators/estimators.py in fit(self, y, X, v)
    196             v = np.ones_like(y)
    197 
--> 198         beta, inv_cov = weighted_least_squares(y, v, X, self.tau2, return_cov=True)
    199         self.params_ = {"fe_params": beta, "tau2": self.tau2, "inv_cov": inv_cov}
    200         return self

~/Documents/tsalo/PyMARE/pymare/stats.py in weighted_least_squares(y, v, X, tau2, return_cov)
     33 
     34     # Einsum indices: k = studies, p = predictors, i = parallel iterates
---> 35     wX = np.einsum("kp,ki->ipk", X, w)
     36     cov = wX.dot(X)
     37 

<__array_function__ internals> in einsum(*args, **kwargs)

/opt/miniconda3/lib/python3.8/site-packages/numpy/core/einsumfunc.py in einsum(out, optimize, *operands, **kwargs)
   1357         if specified_out:
   1358             kwargs['out'] = out
-> 1359         return c_einsum(*operands, **kwargs)
   1360 
   1361     # Check the kwargs to avoid a more cryptic error later, without having to

ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (3,2)->(2,3) (2,3)->(3,newaxis,2) 

Note that the exception is raised when the Estimator is fitted, rather than when the Dataset is initialized, and it's the product of a mathematical operation, rather than an explicit check.

NOTE: Related to #37.

JulioAPeraza commented 2 years ago

In addition to checking the number of rows of y against the number of rows of X, should we also add checks for y vs. v and y vs. n?

tsalo commented 2 years ago

That would be great! The number of rows and columns of v need to match y. I'm not sure if the number of columns of n need to match y, but the number of rows should.