prakharguptaz / Instructdial

Code for the paper Code for the paper InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning
Apache License 2.0
96 stars 13 forks source link

Fix instruction_option sampling #10

Closed alon-albalak closed 2 years ago

alon-albalak commented 2 years ago

Sampling is currently broken for the instruction_option task as it currently returns the list of all possible samples for this task

prakharguptaz commented 2 years ago

Unsure about this one. Will need to check. Please recheck from your side as well.

alon-albalak commented 2 years ago

Me too, it's hard to tell what the intention was here. instruction_option_sampledata and instruction_binary_sampledata are handled differently, even though I believe they are meant to be handled the same.

Line 610 shows that instruction_option_sampledata will gather more samples for each task. Line 613 shows that instruction_binary_sampledata will only use the samples from the last task.

I think I understand now that my suggestion is also incorrect. The reason I found this is that the instruction_option task was creating over 100k samples, while instruction_binary had only 5k. 5k was what I set for both --instruction_option_size and --instruction_binary_size args.

I'll make another commit so that it has, what I believe is the desired behavior: sample --instruction_binary/option_sampledata from all tasks, but only a maximum of --instruction_binary/option_size

prakharguptaz commented 2 years ago

Instruction_binay task indeed needed fixing. In my experiments, I set instruction_option to 200 that led to around 2200 data points. With this new update, one can set instruction_option task size to 22 directly. Although I used 2200, one can try with 5000 points too