Closed Mahhos closed 3 years ago
Hi @Mahhos, I'm on vacation this week, but I'll try to answer your question early next week.
As I understand the docs, after you wrote your own pvp & task_processor, you call the CLI with --task_name my-task
which is defined via TASK_NAME = "my-task"
in the task_processor. As it seems to me you'll have to import your custom code in the form of the two example files in /examples
into the files https://github.com/timoschick/pet/blob/e20bc195455f817119a86b53284a105f2c906fbd/pet/pvp.py & https://github.com/timoschick/pet/blob/e20bc195455f817119a86b53284a105f2c906fbd/pet/tasks.py (and more?)
I ended up copying the classes to the respective files and can confirm it works.
Due to the use of transformers library I had my language model (roberta) already installed on my machine and training started right away :rocket: :clap:
I ended up copying the classes to the respective files and can confirm it works.
Due to the use of transformers library I had my language model (roberta) already installed on my machine and training started right away 🚀 👏
Thanks for your advice. I copied my classes but still the command does not work. This is the command that I am using: Is there anything I should change?
python3 cli.py --method pet --pattern_ids 0 1 --data_dir ./data --model_type albert --model_name_or_path albert-base-v2 --task_name my-task --output_dir ./OUTPUT_DIR/ --do_train --do_eval --pet_per_gpu_eval_batch_size 8 --pet_per_gpu_train_batch_size 2 --pet_gradient_accumulation_steps 8 --pet_max_steps 250 --pet_max_seq_length 256 --pet_repetitions 3 --sc_per_gpu_train_batch_size 2 --sc_per_gpu_unlabeled_batch_size 2 --sc_gradient_accumulation_steps 8 --sc_max_steps 5000 --sc_max_seq_length 256 --sc_repetitions 1
@Mahhos what's your error ?
@Mahhos what's your error ?
No error. I run the command from the terminal and it does not do anything without any error.
@chris-aeviator I found the issue. I run the same command with python
instead of python3
and it worked. However, I am getting an error:
File "E:\My Projects\pet-master-updated\pet\tasks.py", line 848, in _create_examples text_a = row[MyTaskDataProcessor.TEXT_A_COLUMN] IndexError: list index out of range
I think this happens if you have unequal columns in your data. Try to do a very minimal example first. I have started with e.g.
pvp.py
VERBALIZER = {
"0": ["Bad"],
"1": ["Good"],
}
training.csv
0,This is bad,a bad string
1,this is good, a good string
dev.csv
0,something bad, a bad string
1,that's good, good string
unlabeled.csv
,this is supposed to be bad,baddish
,this looks good,goodish
(actually my field B is a category, currently always the same with all examples)
with around 20 training examples (split between training and dev) - I can get pretty good results with real world data by running
python3 cli.py \
--overwrite_output_dir --method pet \
--pattern_ids 0 \
--data_dir data \
--model_type roberta \
--model_name roberta-base \
--task_name my-task \
--output_dir /mnt/[…]/DevRepo/xxxxxxx-pet-model \
--do_eval \
--do_train
@chris-aeviator I guess there is something wrong with my unlabeled.csv
file. Since it can successfully create features from my train.csv
and dev.csv
. When trying to create features from unlabeled.csv
it raised this error. My unlabeled.csv
only has one column including the text. I tell the program to consider column 0 as text_a
. My unlabeled.csv
does not have column 1 including gold labels.
2020-10-11 11:56:33,461 - INFO - tasks - Creating features from dataset file at ./data (num_examples=-1, set_type=unlabeled)
File "E:\My Projects\pet-master-updated\pet\tasks.py", line 848, in _create_examples text_a = row[MyTaskDataProcessor.TEXT_A_COLUMN] IndexError: list index out of range
@Mahhos have you made sure that you keep an empty column in unlabeled.txt
at the same space where you have your label in the train.txt
?
train: label,text_a
unlabeled [emptyness],text_a
so the ,
is important
@chris-aeviator thank you so much. That was a good point!
Hi. I want to train PET on a new task for which I prepared
custom_task_processor.py
andcustom_task_pvp.py
. My question is how should we run/tell the program to read our customized files (instead of the main files) and run the registered new task? It seems that just running the commands under thePET Training and Evaluation
section does not do the task.