TypeError: _inplace_paired_L2() missing 2 required positional arguments: 'A' and 'B' #312

I get this error TypeError: _inplace_paired_L2() missing 2 required positional arguments: 'A' and 'B'

Steps/Code to Reproduce


from sklearn.datasets import make_friedman1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

def friedman_np_to_df(X,y):
  return pd.DataFrame(X,columns=['x0','x1', 'x2', 'x3', 'x4']), pd.Series(y)

# Make training set
X_train, NA = make_friedman1(n_samples=1000, n_features=5, random_state = 1) #dont care about Y so call it NA
X_train, NA = friedman_np_to_df(X_train,NA)

#categorize training set based off of x0
domain_list = []
for i in range(len(X_train)):
  if X_train.iloc[i]['x0'] < 0.6:

X_train['domain'] = domain_list
# Set training set to where domain == 1 (x0 < 0.5)
X_train =  X_train[X_train['domain']==1]
y_train = X_train.copy()
X_train = X_train.drop(columns = ['domain'])
y_train = y_train['domain']

# Make testing set with a different random_state
X_test, NA2 = make_friedman1(n_samples=1000, n_features=5, random_state = 3)
X_test, NA2 = friedman_np_to_df(X_test,NA2)

#categorize testing set based off of x0
domain_list = []
for i in range(len(X_test)):
  if X_test.iloc[i]['x0'] < 0.6:
X_test['domain'] = domain_list

y_test = X_test['domain'].copy()
X_test = X_test.drop(columns = ['domain'])

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from metric_learn import LMNN
lmnn_knn = Pipeline(steps=[('lmnn', LMNN()), ('knn', KNeighborsClassifier())])
parameters = {'lmnn__k':[1, 2,3], 'knn__n_neighbors':[1 , 2]}
grid_lmnn_knn = GridSearchCV(lmnn_knn, parameters, n_jobs=-1, verbose=True),y_train)
grid_lmnn_knn.score(X_test, y_test)

Expected Results

Example: No error is thrown. Score is calculated

Actual Results

Fitting 5 folds for each of 6 candidates, totalling 30 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed:    0.5s finished
Linux-4.19.112+-x86_64-with-Ubuntu-18.04-bionic Python 3.7.10 NumPy 1.19.5 SciPy 1.4.1 Scikit-Learn 0.22.2.post1 Metric-Learn 0.6.2

Looks like either Lx or impostors was empty when computing the gradient of the loss.

Is X_train a numpy array of Pandas dataframe in your call to,y_train)? If it's a dataframe, could you try again using a plain numpy array?

In any case, we should do better checking to surface a less-opaque error message.

Yep, I have tried that .

Replaced last 2 lines with this:,np.array(y_train))
grid_lmnn_knn.score(np.array(X_test), np.array(y_test))

Any other thoughts you can think of?

I reproduced the issue locally, and it turns out that impostors is indeed empty when computing the gradient. See similar issue gh-17 which apparently didn't result in a fix for this same issue.

I haven't verified yet, but I suspect the new LMNN implementation coming in gh-309 will solve this for you. We should also make sure we add test coverage for the no-impostors case.

Sounds good. I will patiently wait for that. If you have any workaround until then, let me know as I have to present on my findings by next Wednesday to my research group lol

Here's a workaround. It just bails out entirely if no impostors can be found:

Not super elegant, but it should work okay.

Thank you for your work @perimosocordiae .

We are trying to apply metric learning to the materials sciences space, but trying to apply this work to Friedman dataset first before going all in on the diffusion datasets. Super weird that my pipeline is now not predicting any of the 0s correctly in my test set. I am not sure if this is a correct approach. In case you were interested in our use case: my PI is telling me to frame a classification problem with the Friedman dataset. Categorize the dataset and set samples where x0 < 0.6 to 1 (sample is within domain), else 0. Then apply metric-learning to that and see if it performs well.

Code so far:

from sklearn.datasets import make_friedman1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

def friedman_np_to_df(X,y):
  return pd.DataFrame(X,columns=['x0','x1', 'x2', 'x3', 'x4']), pd.Series(y)

# Make training set
X_train, NA = make_friedman1(n_samples=1000, n_features=5, random_state = 1) #dont care about Y so call it NA
X_train, NA = friedman_np_to_df(X_train,NA)

#categorize training set based off of x0
domain_list = []
for i in range(len(X_train)):
  if X_train.iloc[i]['x0'] < 0.6 :

X_train['domain'] = domain_list
# Set training set to where domain == 1 (x0 < 0.6)
X_train =  X_train[X_train['domain']==1]
y_train = X_train.copy()
X_train = X_train.drop(columns = ['domain'])
y_train = y_train['domain']

# Make testing set with a different random_state
X_test, NA2 = make_friedman1(n_samples=1000, n_features=5, random_state = 3)
X_test, NA2 = friedman_np_to_df(X_test,NA2)

#categorize testing set based off of x0
domain_list = []
for i in range(len(X_test)):
  if X_test.iloc[i]['x0'] < 0.6:
X_test['domain'] = domain_list

y_test = X_test['domain'].copy()
X_test = X_test.drop(columns = ['domain'])

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from metric_learn import LMNN
lmnn_knn = Pipeline(steps=[('lmnn', LMNN()), ('knn', KNeighborsClassifier())])
parameters = {'lmnn__init': ['pca', 'lda', 'identity', 'random'],
              'knn__weights': ['uniform','distance'],
              'knn__algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
              'knn__leaf_size': [x for x in np.arange(1,30,5)],
              'knn__metric': ['euclidian', 'manhattan', 'mahalanobis', 'seuclidian', 'minkowski']}
grid_lmnn_knn = GridSearchCV(lmnn_knn, parameters,cv = 3, n_jobs=-1, verbose=True, scoring='f1'),np.array(y_train))
# grid_lmnn_knn.score(np.array(X_test), np.array(y_test))

predictions = grid_lmnn_knn.predict(X_test)
from sklearn.metrics import classification_report
print(classification_report(y_test, predictions))

output is:

                 LMNN(convergence_tol=0.001, init='pca', k=2, learn_rate=1e-07,
                      max_iter=1000, min_iter=50, n_components=None,
                      preprocessor=None, random_state=None, regularization=0.5,
                 KNeighborsClassifier(algorithm='auto', leaf_size=1,
                                      metric='manhattan', metric_params=None,
                                      n_jobs=None, n_neighbors=2, p=2,
              precision    recall  f1-score   support

           0       0.00      0.00      0.00       387
           1       0.61      1.00      0.76       613

    accuracy                           0.61      1000
   macro avg       0.31      0.50      0.38      1000
weighted avg       0.38      0.61      0.47      1000

/usr/local/lib/python3.7/dist-packages/sklearn/metrics/ UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

So yeah. Not sure why f1-score is 0 for my 0 cases. Maybe I am doing it wrong haha.

So I think the reason why my last attempt had bad performance on the Friedman dataset was because there were no examples of 0-labeled data in the training set. Now, I include samples of 0-labeled data in the training set. @perimosocordiae I found another issue with your branch:

from sklearn.datasets import make_friedman1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

def friedman_np_to_df(X,y):
  return pd.DataFrame(X,columns=['x0','x1', 'x2', 'x3', 'x4']), pd.Series(y)

# Make training set
X_train, NA = make_friedman1(n_samples=1000, n_features=5, random_state = 1) #dont care about Y so call it NA
X_train, NA = friedman_np_to_df(X_train,NA)

#categorize training set based off of x0
domain_list = []
for i in range(len(X_train)):
  if X_train.iloc[i]['x0'] < 0.6 :

X_train['domain'] = domain_list
# Set training set to where domain == 1 (x0 < 0.6)

out_of_domain = X_train[X_train['domain'] == 0][:60]
X_train =  X_train[X_train['domain']==1]

X_train = pd.concat([out_of_domain, X_train])

y_train = X_train.copy()
X_train = X_train.drop(columns = ['domain'])
y_train = y_train['domain']

# Make testing set with a different random_state
X_test, NA2 = make_friedman1(n_samples=1000, n_features=5, random_state = 3)
X_test, NA2 = friedman_np_to_df(X_test,NA2)

#categorize testing set based off of x0
domain_list = []
for i in range(len(X_test)):
  if X_test.iloc[i]['x0'] < 0.6:
X_test['domain'] = domain_list

y_test = X_test['domain'].copy()
X_test = X_test.drop(columns = ['domain'])

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from metric_learn import LMNN
lmnn_knn = Pipeline(steps=[('lmnn', LMNN()), ('knn', KNeighborsClassifier())])
parameters = {'lmnn__init': ['pca', 'lda', 'identity', 'random'],
              'knn__weights': ['uniform','distance'],
              'knn__algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
              'knn__leaf_size': [x for x in np.arange(1,30,5)],
              'knn__metric': [ 'manhattan', 'mahalanobis', 'minkowski']}
grid_lmnn_knn = GridSearchCV(lmnn_knn, parameters,cv = 5, n_jobs=-1, verbose=True, scoring='f1'),np.array(y_train))
grid_lmnn_knn.score(np.array(X_test), np.array(y_test))


Tried swapping it to the following, but it now goes into an infinite loop šŸ˜‚

if not impostors.any():
   return None, 0, 0
Sorry, try if len(impostors) == 0: instead.

On Sun, Apr 4, 2021 at 2:31 PM Angelo Cortez @.***> wrote:

Tried swapping it to the following, but it now goes into an infinite loop šŸ˜‚

if not impostors.any():

return None, 0, 0

ā€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe .

image Super weird.

Were you able to try the code from gh-309? I'm curious to see how it would handle this case.

The code from the PR doesn't work as well šŸ¤£I will post my results on that thread.