partofthething / ace

Python package for performing the Alternating Conditional Expectation (ACE) regression
MIT License
68 stars 20 forks source link

ACE sometimes gives negative Maximal Correlation values #12

Open AtticusBeachy opened 5 years ago

AtticusBeachy commented 5 years ago

Maximal Correlation (MC) values should always be between 0 and 1. However, when I calculate the MC values of x1 and x2 with y for values of x1 = [ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.] x2 = [ 2., 5., 9., 7., 4., 8., 1., 6., 3., 10.] y = [ 3., 9., 11., 8., 4., 15., 14., 20., 30., 32.] I get a negative MC between x2 and y.

Running the same problem using the R library acepack yields an MC value within the proper range.

Python calculation:

def ACE(x, y):
    ''' 
    Output MCs: Maximal Correlations (MCs) for each variable x 
    Input x: list of 1D numpy arrays, one for each input variable
    Input y: 1D numpy array of responses
    '''
    ace_solver = ace.ACESolver()
    ace_solver.specify_data_set(x, y)
    ace_solver.solve()
    MCs = [] # mutual correlations
    for i in range(len(x)):
        (MC, Pval) = stats.pearsonr( ace_solver.x_transforms[i], ace_solver.y_transform )
        MCs.append( MC )
    return(MCs)

from ace import ace
from scipy import stats
import numpy as np
x = [np.array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]), np.array([ 2.,  5.,  9.,  7.,  4.,  8.,  1.,  6.,  3., 10.])]
y = np.array([ 3.,  9., 11.,  8.,  4., 15., 14., 20., 30., 32.])
MCs = ACE(x, y)
print('MCs = ', MCs)

yields MCs = [0.9523, -0.0577]

Meaning the Maximal Correlation value between x2 and y is -0.058.

R acepack calculation:

library(acepack)
x1 = 1:10
x2 = c(2.,  5.,  9.,  7.,  4.,  8.,  1.,  6.,  3., 10.)
x <- cbind(x1, x2)
y = c( 3.,  9., 11.,  8.,  4., 15., 14., 20., 30., 32.)
ace_model = ace(x, y)
MC = cor(ace_model$tx, ace_model$ty)

yields MC values of

x1 0.9427068
x2 0.3442552

Giving a positive Maximal Correlation value between x2 and y of 0.344

partofthething commented 5 years ago

Thanks, great report. That is indeed a defect. I'll look into it.