timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"
https://arxiv.org/abs/2001.07676
Apache License 2.0
1.62k stars 283 forks source link

Unlabeled data - error in final/p0-i0 #15

Closed kandalav closed 3 years ago

kandalav commented 3 years ago

Amazing work!

I wanted to check regarding an error, while trying to label unlabeled data it throws an error. my unlabeled data CSV is of the form text_a , text_b , "," not sure what could be a the issue here.

timoschick commented 3 years ago

Hi @kandalav, could you provide some further details with regards to the exact error message? Did you write a custom DataProcessor/PVP as described here?

kandalav commented 3 years ago

Yes, I have written the custom processor/PVP as given able to run all the iterations on multiple patterns. I am trying with 5 patterns and i see results from p0-p5 but the final step is breaking and I think it's data formatting error.

happens at this step: self.wrapper.tokenizer.encode_plus(

in get_input_ids raise ValueError( ValueError: Input None is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.

I am processing the data to remove any small descriptions let me write back if I am able to get this to work!

kandalav commented 3 years ago

@timoschick - I was able to run this without errors.

I have train, test, dev and unlabeled.csv while I was expecting the final to run on the unlabeled.csv and give me the predictions. It ran on the dev. How can I set this to run on unlabeled data to give new predictions.

timoschick commented 3 years ago

Hi @kandalav, I am not sure whether I understand your problem correctly. If you want predictions for the unlabeled data rather than for the development data, you can simply modify your DataProcessors get_dev_examples function so that it reuses the unlabeled examples as dev examples:

def get_dev_examples(self, data_dir: str) -> List[InputExample]:
    return self.get_unlabeled_examples(data_dir)

You might then run into an issue because the script expects the dev examples to have labels. To circumvent this issue, you can just assign dummy labels to all dev examples, e.g.:

def get_dev_examples(self, data_dir: str) -> List[InputExample]:
    examples = self.get_unlabeled_examples(data_dir)
    for example in examples:
        example.label = self.get_labels()[0]
    return examples