Incorrect Optimizer Configuration Leads to Misleading Output Messages

Description

I've encountered a critical issue in the optimizer configuration code where the conditions seem to be reversed or incorrectly labeled. This issue not only leads to potential misuse of optimizer parameters but also outputs incorrect console messages that could mislead any user of this library.

Issue Details

In the current implementation of GPT2/examples/NLG/optimizer_custom.py, the conditions for selecting optimizers are reversed or incorrectly implemented. Here's the problematic part of the code:

if args.opt == "scaled_adamw":
    print('USING OPTIMIZER SCALED AdamW')
    optimizer = AdamW(
        grouped_parameters,
        lr=args.lr,
        betas=(args.adam_beta1, args.adam_beta2),
        eps=args.adam_epsilon,
        weight_decay=args.weight_decay,
        correct_bias=args.correct_bias
    )
if args.opt == "adamw":
    print('USING OPTIMIZER SCALED AdamW')
    optimizer = AdamWr(
        grouped_parameters,
        lr=args.lr,
        betas=(args.adam_beta1, args.adam_beta2),
        eps=args.adam_epsilon,
        weight_decay=args.weight_decay,
        correct_bias=args.correct_bias,
        rank=args.lora_dim
    )

Problems

Both conditions output the same message "USING OPTIMIZER SCALED AdamW", which is incorrect for one of the branches.
The first condition, meant for scaled_adamw, incorrectly initializes AdamW. The second condition, meant for adamw, incorrectly initializes AdamWr and includes a parameter rank=args.lora_dim, which is not applicable for the standard AdamW optimizer but suggests a specialized version perhaps related to the LoRA architecture.

Expected Behavior

Each branch should accurately reflect the optimizer being used, and the console output should match the actual optimizer configuration. The parameters and optimizer choice should align correctly with the user's input.

Request

I urge the maintainers to:

Correct the optimizer selection logic.
Ensure the console outputs accurately reflect the chosen optimizer.
Provide a configuration that can reproduce the results shown in Table 1 of the paper, as this misconfiguration casts doubts on the experimental setup and the reported results.

Additional Context

This misconfiguration is critical as it affects the reproducibility and integrity of the results presented in your experiments (refer to Table 1 from the attached image in this issue). Please provide an updated and correct configuration to avoid further confusion and to maintain the credibility of the research.

Thank you for addressing this issue promptly.

pilancilab / Riemannian_Preconditioned_LoRA