Open asleep opened 3 years ago
Did you run preprocess_clinc150
?
We used data_full and made the target domain (e.g. banking) fewshot in preprocessing. This gives an imbalanced dataset, where the target domain has, say, 10 examples per intent, where the rest of the domains have more data. This happens at: https://github.com/google/example_extrapolation/blob/8bf472143952d019f4b7273b9236f7591c27feb9/preprocess_clinc150.py#L109
How do you evaluate the correctness of the model for intent classification? I parse the ouptut sentence until it sees an EOS token, then compare the text until that part with the ground truth class value (so I basically ignore anything that's generated after the EOS token). The ground truth class values are literal class names (not class ids)
It'd be great if you could clarify this (ping: @luheng).
Thanks.
Hi,
I experimented with verifying your approach however I didn't receive such low F1 numbers for few shot. In looking at data_full, banking has the same amount of training samples as all other domains/intents. Did you in fact use data imbalanced?