Open projectavi opened 3 months ago
I have the same problem. Do you have any updates?
I met the same problem: There is some issue with the map, if the map input rows/batch is 1000 and output is 1096 rows, then the problem popup, one walkaround is writing the map manully.
dataloader = DataLoader(dataset, batch_size=1000)
d = {}
d["input_ids"] = []
d["attention_mask"] = []
d["start_locs"] = []
for batch in tqdm(dataloader):
p_batch = preproccess(batch)
d["input_ids"].extend(p_batch["input_ids"])
d["attention_mask"].extend(p_batch["attention_mask"])
d["start_locs"].extend(p_batch["start_locs"])
dataset = Dataset.from_dict(d)
When I try running unlearn_harm.py I get the following error:
pyarrow.lib.ArrowInvalid: Column 5 named input_ids expected length 1000 but got length 1096
after replacing load_dataset's split with "train" because the original split did not exist.