Labeling Unlabeled Data

timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

https://arxiv.org/abs/2001.07676

Apache License 2.0

1.62k stars 283 forks source link

Labeling Unlabeled Data #12

Closed Mahhos closed 3 years ago

Mahhos commented 3 years ago

Having get_dev_examples() method to return the same data as the get_unlabeled_examples() method raised this error. It seems that it expects labels for dev.csv. However, the unlabeled.csv does not have labels and has , instead.

File "/pet-master/pet/preprocessor.py", line 83, in get_input_features label = self.label_map[example.label] if example.label is not None else -100 KeyError: ','

Mahhos commented 3 years ago

To clarify, my unlabeled.csv has a column with text (text_a) and another column with ,. I first try not to put , in my unlabeled.csv, however, the programs needed to know the 2nd column is reserved for labels so I put , in all rows of the 2nd column.

timoschick commented 3 years ago

Right, the current implementation expects labels for your development file in order to compute accuracy on the dev set. I don't know how your file unlabeled.csv and your corresponding TaskProcessor looks like, but you can bypass this by simply assigning some (random) label to all examples. For example, you can define the method get_dev_examples() in your custom TaskProcessor similar to this:

def get_dev_examples(self, data_dir) -> List[InputExample]:
    examples = self.get_unlabeled_examples(data_dir)
    for example in examples:
         example.label = self.get_labels()[0]
    return examples