Closed fozziethebeat closed 1 month ago
My debugging tells me that the WandbCallback is doing something weird to the table such that you can't pickle the table after it runs. I hacked this code locally to do a deepcopy after the wandb.log
and it broke just like this.
I'm guessing wandb did something such that the log event prevents pickling?
I've extracted the core flow that the DPO log statement is triggering and the combination seems not feasible:
import wandb
import copy
wandb.login()
run = wandb.init(
project="wandb-debug",
)
# make a table
table = wandb.Table(
columns=["Prompt", "Policy", "Ref Model"],
rows=[
["prompt", "other prompt", "more prompt"],
],
)
# prove copying before logging works
copy.deepcopy({
"game_log": table,
})
# log the table
wandb.log({"table": table})
# pickle failure here
copy.deepcopy({
"game_log": table,
})
I tried this with wandb with a sample of versions from 0.14.0 up to 0.17.4 and they all triggered this flow so I'm assuming its a feature of WandB.
Ultimately I think the wandb callback should probably remove the table before other tables try to do a deep copy
Encountered same problem when doing DPO with generate_during_eval=True
Yes, this flow only triggers when using that setting.
Yes, this flow only triggers when using that setting.
Did you manage to find some solution or how to perform generation on evaluation? I tried various options with direct logging into wandb with custom callback, however my process hangs with usage of DeepSpeed when logging only on main process.
I haven't found a good solution yet.
The problem came about due to this commit where they introduced a deepcopy. From the release history it should be present in transformers versions from https://github.com/huggingface/transformers/releases/tag/v4.40.0 onwards.
As noted above, i've reported this to Transformers with my proposed fix: don't pickle things with a deep copy.
Transformers is now fixed after this commit. Managed to figure out where Transformers accepted a PR that did a DeepCopy of WandB tables.
@fozziethebeat Sorry for bothering you, but maybe you observed the following: during generate_during_eval=True
ref_model response is similiar to the currently trained policy model. So seems like ref_model is being updated, which should not be the case for DPO. This issue is very similiar
Yeah that's the code path that triggers this. But it's fixed now with the latest version of transformers
This is a re-occurrence of #914.
Copy pasting my comments so they're here too:
I think this problem has resurfaced at some point. I'm running TRL indirectly through Axolotl and I'm seeing this line triggering the ProgressCallback.
Then pretty naturally when that callback does logs = copy.deepcopy(logs) the WandB table in the logs breaks things with the same failure.
The key parts of my stacktrace are:
Important versions of libraries are:
The only solution i've found is to drop wandb logging or deleting the log line in the DPO Trainer.
I feel like the right fix is that when the DPO trainer calls self.log(...) it should not trigger the ProgressCallback and its kinda weird that it is