Open ldqvinh opened 10 months ago
Thanks for taking an interest in the code! I'm not immediately sure what the issue could be, but some things to try are:
--min_length 10
and --max_length 10
--autoregressive
flag.--n_cat_mlps 0
(no MLPs)--n_epochs = 500
.Note that you likely need to try a number of random seeds to get a model that successfully learns the task. To save time, we also used a "patience" of 25 (this is a possible argument to the run_training
function, although you would need to modify src/run.py
to make it a command-line flag).
Could you also share any more details about what you observe? Do you get "loss is NaN" right away, or only after some training?
Hi @danfriedman0,
I also had issue replicating the induction experiment. The command was as suggested above, which is also copied below. I used a modified file "experiment_run_n.py" that iterates through seed when a training running out of the patience with an additional "patience" argument. The training seems to return a constant loss=5.81e+29 from seed 0 all the way to 100. By the way, some other experiments seemed to work, such as "sort", "reverse", etc.
CUDA_VISIBLE_DEVICES=0 python experiment_run_n.py \ --dataset "induction" \ --vocab_size 10 \ --dataset_size 20000 \ --min_length 10 \ --max_length 10 \ --n_epochs 500 \ --batch_size 512 \ --patience 25 \ --lr "5e-2" \ --n_layers 2 \ --n_heads_cat 1 \ --n_heads_num 0 \ --n_cat_mlps 0 \ --n_num_mlps 0 \ --one_hot_embed \ --count_only \ --autoregressive \ --seed 0 \ --save \ --save_code \ --output_dir "output/induction"
Also, it would be great if you can share the configuration for replicating all experiments in the paper, like that for "sort" and "conll_ner" in the README.md. Thanks!
Hi all, sorry for the trouble, and thanks for the additional detail.
I think I found the main problem: you need to set --unembed_mask 0
. This flag is set to 1
by default, which prevents the model from predicting pad
or unk
as the output token, but the unk
token is a valid prediction for this task. I have uploaded a script with a command that works for me (on around 20% of seeds).
@Wangcheng-Xu : The scripts directory contains configurations used for the other experiments in the paper. Please let me know if you have any more questions.
Thank you! I have tested the fixed configuration for the induction task, which works for me.
Thank you to everyone involved for identifying and resolving the issue. The updated configuration for the induction task is now functioning perfectly on my end as well.
Hello, Thanks for your work. I attempted the in-context learning training command from the experiment details, but encountered a 'loss is NaN' error. Could you share the command you used? Appreciate it.