Traceback (most recent call last):
File "/data/miniconda3/envs/env-3.8.8/lib/python3.8/site-packages/datasets/builder.py", line 1874, in _prepare_split_single
writer.write_table(table)
File "/data/miniconda3/envs/env-3.8.8/lib/python3.8/site-packages/datasets/arrow_writer.py", line 567, in write_table
pa_table = pa_table.combine_chunks()
File "pyarrow/table.pxi", line 3315, in pyarrow.lib.Table.combine_chunks
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: offset overflow while concatenating arrays
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "run_uie_pretrain.py", line 509, in <module>
main()
File "run_uie_pretrain.py", line 148, in main
datasets = load_dataset(
File "/data/miniconda3/envs/env-3.8.8/lib/python3.8/site-packages/datasets/load.py", line 1782, in load_dataset
builder_instance.download_and_prepare(
File "/data/miniconda3/envs/env-3.8.8/lib/python3.8/site-packages/datasets/builder.py", line 872, in download_and_prepare
self._download_and_prepare(
File "/data/miniconda3/envs/env-3.8.8/lib/python3.8/site-packages/datasets/builder.py", line 967, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/data/miniconda3/envs/env-3.8.8/lib/python3.8/site-packages/datasets/builder.py", line 1749, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/data/miniconda3/envs/env-3.8.8/lib/python3.8/site-packages/datasets/builder.py", line 1892, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
陆博您好,很感谢您公开UIE模型的代码!
在程序加载构造的预训练数据时,报了以下错误:
当数据集规模为500w时,会报以上的错误,而当数据集规模减少至100w时,程序可以正常运行,因此从报错原因来看是因为数据集太大从而导致加载出错,而且此时内存未满。
因此有几个问题想请教您: