kochat/utils/metrics.py에서 호출한 classification_report() 중 `ValueError: Number of classes, 1, does not match size of target_names, 2. Try specifying the labels parameter`

Intro

안녕하세요 @hyunwoongko 님! 한국어 챗봇 프레임워크를 필요로 했는데, 너무 잘 만드신 것 같습니다! 코드와 자세한 docs를 읽어보며 감탄했습니다. 덕분에 원하는 기능의 챗봇을 만들 수 있을 것 같습니다.

문제 상황

[DistanceClassifier] 학습을 완료한 후, 이런 에러가 발생합니다. (아마 OOD를 이용해 classification metrics report 파일을 만드는 과정인 것 같습니다.)

...
[DistanceClassifier] Epoch : 10, ETA : 4.3569 sec 
Traceback (most recent call last):
  File "application.py", line 26, in <module>
    kochat = KochatApi(
  File "/workspace/.pyenv_mirror/user/3.8.19/lib/python3.8/site-packages/kochat/app/kochat_api.py", line 56, in __init__
    self.__fit_intent()
  File "/workspace/.pyenv_mirror/user/3.8.19/lib/python3.8/site-packages/kochat/app/kochat_api.py", line 153, in __fit_intent
    self.intent_classifier.fit(self.dataset.load_intent(self.embed_processor))
  File "/workspace/.pyenv_mirror/user/3.8.19/lib/python3.8/site-packages/kochat/proc/intent_classifier.py", line 44, in fit
    report, _ = self.metrics.report(['in_dist', 'out_dist'], mode='ood')
  File "/workspace/.pyenv_mirror/user/3.8.19/lib/python3.8/site-packages/sklearn/utils/_testing.py", line 317, in wrapper
    return fn(*args, **kwargs)
  File "/workspace/.pyenv_mirror/user/3.8.19/lib/python3.8/site-packages/kochat/utils/metrics.py", line 86, in report
    classification_report(
  File "/workspace/.pyenv_mirror/user/3.8.19/lib/python3.8/site-packages/sklearn/utils/validation.py", line 73, in inner_f
    return f(**kwargs)
  File "/workspace/.pyenv_mirror/user/3.8.19/lib/python3.8/site-packages/sklearn/metrics/_classification.py", line 1950, in classification_report
    raise ValueError(
ValueError: Number of classes, 1, does not match size of target_names, 2. Try specifying the labels parameter

저의 생각

kochat/utils/metrics.py의 Metrics.report() 함수를 보면 classification_report() 함수를 호출하고 있습니다.

class Metrics:

    ...

    def report(self, label_dict: dict, mode: str) -> tuple:
        """
        분류 보고서와 confusion matrix를 출력합니다.
        여기에는 Precision, Recall, F1 Score, Accuracy 등이 포함됩니다.

        :return: 다양한 메트릭으로 측정한 모델 성능
        """

        ...

        report = DataFrame(
            classification_report(
                y_true=label,
                y_pred=predict,
                target_names=list(label_dict),
                output_dict=True
            )
        )

        ...

classification_report() 함수 정의는 다음과 같습니다. 에러는 해당 코드의 맨 마지막 줄에서 발생합니다.

def classification_report(y_true, y_pred, *, labels=None, target_names=None,
                          sample_weight=None, digits=2, output_dict=False,
                          zero_division="warn"):
    """Build a text report showing the main classification metrics.

    Read more in the :ref:`User Guide <classification_report>`.

    Parameters
    ----------
    y_true : 1d array-like, or label indicator array / sparse matrix
        Ground truth (correct) target values.

    y_pred : 1d array-like, or label indicator array / sparse matrix
        Estimated targets as returned by a classifier.

    labels : array, shape = [n_labels]
        Optional list of label indices to include in the report.

    target_names : list of strings
        Optional display names matching the labels (same order).

    sample_weight : array-like of shape (n_samples,), default=None
        Sample weights.

    digits : int
        Number of digits for formatting output floating point values.
        When ``output_dict`` is ``True``, this will be ignored and the
        returned values will not be rounded.

    output_dict : bool (default = False)
        If True, return output as dict

        .. versionadded:: 0.20

    zero_division : "warn", 0 or 1, default="warn"
        Sets the value to return when there is a zero division. If set to
        "warn", this acts as 0, but warnings are also raised.

    Returns
    -------
    report : string / dict
        Text summary of the precision, recall, F1 score for each class.
        Dictionary returned if output_dict is True. Dictionary has the
        following structure::

            {'label 1': {'precision':0.5,
                         'recall':1.0,
                         'f1-score':0.67,
                         'support':1},
             'label 2': { ... },
              ...
            }

        The reported averages include macro average (averaging the unweighted
        mean per label), weighted average (averaging the support-weighted mean
        per label), and sample average (only for multilabel classification).
        Micro average (averaging the total true positives, false negatives and
        false positives) is only shown for multi-label or multi-class
        with a subset of classes, because it corresponds to accuracy otherwise.
        See also :func:`precision_recall_fscore_support` for more details
        on averages.

        Note that in binary classification, recall of the positive class
        is also known as "sensitivity"; recall of the negative class is
        "specificity".

    See also
    --------
    precision_recall_fscore_support, confusion_matrix,
    multilabel_confusion_matrix

    Examples
    --------
    >>> from sklearn.metrics import classification_report
    >>> y_true = [0, 1, 2, 2, 2]
    >>> y_pred = [0, 0, 2, 2, 1]
    >>> target_names = ['class 0', 'class 1', 'class 2']
    >>> print(classification_report(y_true, y_pred, target_names=target_names))
                  precision    recall  f1-score   support
    <BLANKLINE>
         class 0       0.50      1.00      0.67         1
         class 1       0.00      0.00      0.00         1
         class 2       1.00      0.67      0.80         3
    <BLANKLINE>
        accuracy                           0.60         5
       macro avg       0.50      0.56      0.49         5
    weighted avg       0.70      0.60      0.61         5
    <BLANKLINE>
    >>> y_pred = [1, 1, 0]
    >>> y_true = [1, 1, 1]
    >>> print(classification_report(y_true, y_pred, labels=[1, 2, 3]))
                  precision    recall  f1-score   support
    <BLANKLINE>
               1       1.00      0.67      0.80         3
               2       0.00      0.00      0.00         0
               3       0.00      0.00      0.00         0
    <BLANKLINE>
       micro avg       1.00      0.67      0.80         3
       macro avg       0.33      0.22      0.27         3
    weighted avg       1.00      0.67      0.80         3
    <BLANKLINE>
    """

    y_type, y_true, y_pred = _check_targets(y_true, y_pred)

    labels_given = True
    if labels is None:
        labels = unique_labels(y_true, y_pred) # labels의 정의되는 지점
        labels_given = False
    else:
        labels = np.asarray(labels)

    # labelled micro average
    micro_is_accuracy = ((y_type == 'multiclass' or y_type == 'binary') and
                         (not labels_given or
                          (set(labels) == set(unique_labels(y_true, y_pred)))))

    if target_names is not None and len(labels) != len(target_names):
        if labels_given:
            warnings.warn(
                "labels size, {0}, does not match size of target_names, {1}"
                .format(len(labels), len(target_names))
            )
        else:
            raise ValueError(
                "Number of classes, {0}, does not match size of "
                "target_names, {1}. Try specifying the labels "
                "parameter".format(len(labels), len(target_names))
            ) # 여기에서 에러가 발생합니다!
    ...

즉, labels와 target_names의 길이가 달라서 에러가 발생하는 것으로 보입니다. labels는 classification_report() 함수에서 일부러 None 값이 들어가도록 따로 값을 적어 호출하지 않으신 것 같아서 labels는 unique_labels(y_true, y_pred)로 정의됩니다.

unique_labels() 함수의 설명 속 예시는 다음과 같습니다.

    Examples
    --------
    >>> from sklearn.utils.multiclass import unique_labels
    >>> unique_labels([3, 5, 5, 5, 7, 7])
    array([3, 5, 7])
    >>> unique_labels([1, 2, 3, 4], [2, 2, 3, 4])
    array([1, 2, 3, 4])
    >>> unique_labels([1, 2, 10], [5, 11])
    array([ 1,  2,  5, 10, 11])

즉, unique_labels(y_true, y_pred)는 y_true와 y_pred를 합집합 하는 연산이라 보입니다.

문제는 이때 y_true와 y_pred가 모두 동일한 label인 1, 즉 out_dist을 가지고 있을 때 발생합니다. (학습을 충분히 시키지 않은 문제도 있지만, 모두 OOD로 분류되더라도 학습은 진행되어야 하는 것 아닌가요?)

y_true와 y_pred를 출력해보면 각각 [1 1 1 ... 1 1 1]과 [1 1 1 ... 1 1 1]로, 길이는 동일합니다.

해당 에러는 어떻게 해결할 수 있을까요? 열심히 제 나름대로 저의 시행착오를 정리했는데 두서가 없는 점 죄송합니다 ㅠㅠ 멋진 프레임워크를 공유해주셔서 다시 한 번 감사합니다.

추가적으로 ood_data.csv는 데모 템플릿에 있는 것을 그대로 사용했고, (약 12000 row) intent는 restaurant 만을 주었으며, 해당 파일은 다음과 같습니다. (약 80 row)

question,label

주변에 음식점 알려줘,O S-RESTAURANT O
그럼 주변에 음식점 알려줘,O O S-RESTAURANT O
주변 음식점 뭐 있어,O S-RESTAURANT O O
그럼 주변에 음식점 뭐 있어,O O S-RESTAURANT O O
음식점 주변에 뭐 있을까,S-RESTAURANT O O O
그럼 음식점 주변에 뭐 있을까,O S-RESTAURANT O O O
음식 먹고 싶다,O O O
그럼 음식 먹고 싶다,O O O O
음식 먹고 싶어,O O O
그럼 음식 먹고 싶어,O O O O
음식점 가고 싶어,S-RESTAURANT O O
그럼 음식점 가고 싶어,O S-RESTAURANT O O
음식점 추천해줘,S-RESTAURANT O
그럼 음식점 추천해줘,O S-RESTAURANT O
주변에 식당 알려줘,O S-RESTAURANT O
그럼 주변에 식당 알려줘,O O S-RESTAURANT O
주변 식당 뭐 있어,O S-RESTAURANT O O
그럼 주변에 식당 뭐 있어,O O S-RESTAURANT O O
식당 주변에 뭐 있을까,S-RESTAURANT O O O
그럼 식당 주변에 뭐 있을까,O S-RESTAURANT O O O
식당 먹고 싶다,S-RESTAURANT O O
그럼 식당 먹고 싶다,O S-RESTAURANT O O
식당 먹고 싶어,S-RESTAURANT O O
그럼 식당 먹고 싶어,O S-RESTAURANT O O
식당 가고 싶어,S-RESTAURANT O O
그럼 식당 가고 싶어,O S-RESTAURANT O O
식당 추천해줘,S-RESTAURANT O
그럼 식당 추천해줘,O S-RESTAURANT O
주변에 먹거리 알려줘,O O O
그럼 주변에 먹거리 알려줘,O O O O
주변 먹거리 뭐 있어,O O O O
그럼 주변에 먹거리 뭐 있어,O O O O O
먹거리 주변에 뭐 있을까,O O O O
그럼 먹거리 주변에 뭐 있을까,O O O O O
먹거리 먹고 싶다,O O O
그럼 먹거리 먹고 싶다,O O O O
먹거리 먹고 싶어,O O O
그럼 먹거리 먹고 싶어,O O O O
먹거리 가고 싶어,O O O
그럼 먹거리 가고 싶어,O O O O
먹거리 추천해줘,O O
그럼 먹거리 추천해줘,O O O
주변에 맛집 알려줘,O S-RESTAURANT O
그럼 주변에 맛집 알려줘,O O S-RESTAURANT O
주변 맛집 뭐 있어,O S-RESTAURANT O O
그럼 주변에 맛집 뭐 있어,O O S-RESTAURANT O O
맛집 주변에 뭐 있을까,S-RESTAURANT O O O
그럼 맛집 주변에 뭐 있을까,O S-RESTAURANT O O O
맛있는거 먹고 싶다,S-RESTAURANT O O
그럼 맛있는거 먹고 싶다,O S-RESTAURANT O O
맛있는거 먹고 싶어,S-RESTAURANT O O
그럼 맛있는거 먹고 싶어,O S-RESTAURANT O O
맛집 가고 싶어,S-RESTAURANT O O
그럼 맛집 가고 싶어,O S-RESTAURANT O O
맛집 추천해줘,S-RESTAURANT O
그럼 맛집 추천해줘,O S-RESTAURANT O
주변에 배고픈데 알려줘,O O O
그럼 주변에 배고픈데 알려줘,O O O O
주변 배고픈데 뭐 있어,O O O O
그럼 주변에 배고픈데 뭐 있어,O O O O O
배고픈데 주변에 뭐 있을까,O O O O
그럼 배고픈데 주변에 뭐 있을까,O O O O O
배고픈데 먹고 싶다,O O O
그럼 배고픈데 먹고 싶다,O O O O
배고픈데 먹고 싶어,O O O
그럼 배고픈데 먹고 싶어,O O O O
배고파 가고 싶어,O O O
그럼 배고파 가고 싶어,O O O O
배고파 추천해줘,O O
그럼 배고파 추천해줘,O O O
주변에 밥 알려줘,O O O
그럼 주변에 밥 알려줘,O O O O
주변 밥 뭐 있어,O O O O
그럼 주변에 밥 뭐 있어,O O O O O
밥 주변에 뭐 있을까,O O O O
그럼 밥 주변에 뭐 있을까,O O O O O
밥 먹고 싶다,O O O
그럼 밥 먹고 싶다,O O O O
밥 먹고 싶어,O O O
그럼 밥 먹고 싶어,O O O O
밥 가고 싶어,O O O
그럼 밥 가고 싶어,O O O O
밥 추천해줘,O O
그럼 밥 추천해줘,O O O

hyunwoongko / kochat