Closed Charmelink closed 2 years ago
No. it looks like it needs utf-8 specified on your machine when using open. I will look into it. In the mean time can you check:
import sys
sys.getdefaultencoding()
and if not utf-8 change it.
This seemed to be an issue with the way I downloaded the project. Using git clone fixed the issue. It only happened when downloading from github directly. Thanks!
Hi, I am trying to run this project as described in the readme. I completed installation and tried to run a config, but after trying each config I have been stopped at a UnicodeEncodeError. Each traceback is slightly different; the e2e_clean is the only one that makes it to training, but also crashes due to a UnicodeEncodeError after Epoch 0.
Here's a couple tracebacks for example. For webnlg17 Traceback (most recent call last): File "finetune.py", line 932, in
model = main(args)
File "finetune.py", line 902, in main
logger=logger,
File "/workspace/ControlPrefixes-main/src/datatotext/lightning_base.py", line 634, in generic_train
trainer.fit(model)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
self.dispatch()
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch
self.accelerator.start_training(self)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training
self.training_type_plugin.start_training(trainer)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training
self._results = trainer.run_train()
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 607, in run_train
self.run_sanity_check(self.lightning_module)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 864, in run_sanitycheck
, eval_results = self.run_evaluation(max_batches=self.num_sanity_val_batches)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 742, in run_evaluation
deprecated_eval_results = self.evaluation_loop.evaluation_epoch_end()
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/evaluation_loop.py", line 189, in evaluation_epoch_end
deprecated_results = self.run_eval_epoch_end(self.num_dataloaders)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/evaluation_loop.py", line 227, in run_eval_epoch_end
eval_results = model.validation_epoch_end(eval_results)
File "finetune.py", line 345, in validation_epoch_end
convert_text(s) + "\n" for s in output_batch["target"]
UnicodeEncodeError: 'ascii' codec can't encode character '\xe1' in position 9: ordinal not in range(128)
For DART
Traceback (most recent call last): File "finetune.py", line 932, in
model = main(args)
File "finetune.py", line 902, in main
logger=logger,
File "/workspace/ControlPrefixes-main/src/datatotext/lightning_base.py", line 634, in generic_train
trainer.fit(model)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
self.call_setup_hook(model)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 1066, in call_setup_hook
model.setup(stage_name)
File "/workspace/ControlPrefixes-main/src/datatotext/lightning_base.py", line 286, in setup
"train", self.hparams.train_batch_size, shuffle=True
File "finetune.py", line 610, in get_dataloader
dataset = self.get_dataset(type_path)
File "finetune.py", line 603, in get_dataset
**self.dataset_kwargs,
File "/workspace/ControlPrefixes-main/src/datatotext/utils.py", line 610, in init
self.src_lens = self.get_char_lens(self.src_file)
File "/workspace/ControlPrefixes-main/src/datatotext/utils.py", line 633, in get_char_lens
return [len(x) for x in Path(data_file).open().readlines()]
File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 6422: ordinal not in range(128)
Every config does this at some point; let me know if you need more information. I tried moving the data around, unzipping it differently, rolling pytorch-lightning back to an older/newer version, but nothing seems to work. Is there some unspoken data processing step that needs to be done before training? Thanks, CH