phanein / deepwalk

DeepWalk - Deep Learning for Graphs
http://www.perozzi.net/projects/deepwalk/
Other
2.67k stars 827 forks source link

error in score.py #32

Closed lemmonation closed 7 years ago

lemmonation commented 7 years ago

I run the score.py and get the following bugs:

Traceback (most recent call last):
  File "scoring.py", line 96, in <module>
    clf.fit(X_train, y_train)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/multiclass.py", line 205, in fit
    Y = self.label_binarizer_.fit_transform(y)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 494, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/preprocessing/label.py", line 296, in fit
    self.y_type_ = type_of_target(y)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 250, in type_of_target
    raise ValueError('You appear to be using a legacy multi-label data'
ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead.

Then I solved this by adding:

from sklearn.preprocessing import MultiLabelBinarizer
y_train = MultiLabelBinarizer().fit_transform(y_train)
y_test = MultiLabelBinarizer().fit_transform(y_test)
preds = MultiLabelBinarizer().fit_transform(preds)

But a new bug throws:

Traceback (most recent call last):
  File "scoring.py", line 108, in <module>
    results[average] = f1_score(y_test,  preds, average=average)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 692, in f1_score
    sample_weight=sample_weight)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 806, in fbeta_score
    sample_weight=sample_weight)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 1004, in precision_recall_fscore_support
    present_labels = unique_labels(y_true, y_pred)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 92, in unique_labels
    raise ValueError("Multi-label binary indicator input with "
ValueError: Multi-label binary indicator input with different numbers of labels

The bug occurs may because the sklearn function used in score.py is too old to fit the new version. So now is there any trik to solve this?

GTmac commented 7 years ago

One easy solution is to downgrade the sklearn package: pip install -U scikit-learn==0.16.1 Of course, you can also do this in virtualenv.

lemmonation commented 7 years ago

Worked. Thank you! @GTmac

syeddanishkazmi commented 5 years ago

One easy solution is to downgrade the sklearn package: pip install -U scikit-learn==0.16.1 Of course, you can also do this in virtualenv.

i had tried but getting same error can any one help please My Code is: valid_x_predictions = lstm_autoencoder.predict(valid_x_0) mse = np.mean(np.power(flatten(valid_x_0) - flatten(valid_x_predictions), 2), axis=1)

error_df = pd.DataFrame({'Reconstruction_error': mse, 'True_class': y_valid.tolist()})

precision_rt, recall_rt, threshold_rt = precision_recall_curve(error_df.True_class, error_df.Reconstruction_error) plt.plot(threshold_rt, precision_rt[1:], label="Precision",linewidth=5) plt.plot(threshold_rt, recall_rt[1:], label="Recall",linewidth=5) plt.title('Precision and recall for different threshold values') plt.xlabel('Threshold') plt.ylabel('Precision/Recall') plt.legend() plt.show()

Error is:

ValueError Traceback (most recent call last)

in () 5 'True_class': y_valid.tolist()}) 6 ----> 7 precision_rt, recall_rt, threshold_rt = precision_recall_curve(error_df.True_class, error_df.Reconstruction_error) 8 plt.plot(threshold_rt, precision_rt[1:], label="Precision",linewidth=5) 9 plt.plot(threshold_rt, recall_rt[1:], label="Recall",linewidth=5) /home/danish/anaconda3/envs/tf10/lib/python2.7/site-packages/sklearn/metrics/ranking.pyc in precision_recall_curve(y_true, probas_pred, pos_label, sample_weight) 520 fps, tps, thresholds = _binary_clf_curve(y_true, probas_pred, 521 pos_label=pos_label, --> 522 sample_weight=sample_weight) 523 524 precision = tps / (tps + fps) /home/danish/anaconda3/envs/tf10/lib/python2.7/site-packages/sklearn/metrics/ranking.pyc in _binary_clf_curve(y_true, y_score, pos_label, sample_weight) 392 """ 393 # Check to make sure y_true is valid --> 394 y_type = type_of_target(y_true) 395 if not (y_type == "binary" or 396 (y_type == "multiclass" and pos_label is not None)): /home/danish/anaconda3/envs/tf10/lib/python2.7/site-packages/sklearn/utils/multiclass.pyc in type_of_target(y) 260 if (not hasattr(y[0], '__array__') and isinstance(y[0], Sequence) 261 and not isinstance(y[0], string_types)): --> 262 raise ValueError('You appear to be using a legacy multi-label data' 263 ' representation. Sequence of sequences are no' 264 ' longer supported; use a binary array or sparse' **ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead.**
GTmac commented 5 years ago

@syeddanishkazmi how is the error you get related to DeepWalk? Have you tried using MultiLabelBinarizer?

NaimMhedhbi1 commented 3 years ago

pip install -U scikit-learn==0.16.1

thank you