modelscope / AdaSeq

AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models
Apache License 2.0
405 stars 36 forks source link

[Question]How to solve [datasets.builder.DatasetGenerationError: An error occurred while generating the dataset] #35

Open Shawnzheng011019 opened 10 months ago

Shawnzheng011019 commented 10 months ago

What is your question?

Traceback (most recent call last):
File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1618, in _prepare_split_single writer = writer_class( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\arrow_writer.py", line 334, in init self.stream = self._fs.open(fs_token_paths[2][0], "wb") File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\spec.py", line 1309, in open f = self._open( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\implementations\local.py", line 180, in _open return LocalFileOpener(path, mode, fs=self, **kwargs) File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\implementations\local.py", line 298, in init self._open() File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\fsspec\implementations\local.py", line 303, in _open self.f = open(self.path, mode=self.mode) FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/shawn/.cache/huggingface/datasets/named_entity_recognition_dataset_builder/default-c270794ce0d 23d06/0.0.0/db737b9bb893f20fb03d04403a30bf7c033256c212b7e9f0ebc6e9c958535c51.incomplete/named_entity_recognition_dataset_builder-train-00000-00000-of-NNNNN.arro w'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\Users\shawn\anaconda3\envs\pytorch\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\shawn\anaconda3\envs\pytorch\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\shawn\anaconda3\envs\pytorch\Scripts\adaseq.exe__main.py", line 7, in File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\main.py", line 13, in run main(prog='adaseq') File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands\init__.py", line 29, in main args.func(args) File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands\train.py", line 84, in train_model_from_args train_model( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands\train.py", line 156, in train_model trainer = build_trainer_from_partial_objects( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\commands\train.py", line 185, in build_trainer_from_partial_objects dm = DatasetManager.from_config(task=config.task, config.dataset) File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\adaseq\data\dataset_manager.py", line 182, in from_config hfdataset = hf_load_dataset(path, name=name, kwargs) File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\load.py", line 1797, in load_dataset builder_instance.download_and_prepare( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 909, in download_and_prepare self._download_and_prepare( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1670, in _download_and_prepare super()._download_and_prepare( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1004, in _download_and_prepare self._prepare_split(split_generator, **prepare_split_kwargs) File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1508, in _prepare_split for job_id, done, content in self._prepare_split_single( File "C:\Users\shawn\anaconda3\envs\pytorch\lib\site-packages\datasets\builder.py", line 1665, in _prepare_split_single raise DatasetGenerationError("An error occurred while generating the dataset") from e datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

What have you tried?

set http proxy and successfully conneted to Youtube.

Code (if necessary)

No response

What's your environment?

Code of Conduct

Shawnzheng011019 commented 10 months ago

environment was set automatically by the file requiremets.txt

ykallan commented 9 months ago

同样遇到这个问题,看起来应该是adaseq加载数据集的时候,可能处理逻辑有问题,加载数据集的格式

···text data_type: json_spans ···

可能有点问题

PPPP-kaqiu commented 6 months ago

是因为数据集找不到或者数据集不是标准的解析格式,可以按照toy msra的加载代码重写一下数据加载

houyuchao commented 6 months ago

@PPPP-kaqiu 你重新写了吗?可以分享一下吗

lichen146 commented 4 months ago

@Shawnzheng011019 请问解决了吗,大哥

PPPP-kaqiu commented 4 months ago

完全按照hf dataset的格式写数据加载脚本,yaml的数据加载就只写数据那个文件夹就好了

lichen146 commented 4 months ago

@PPPP-kaqiu 加个微信吧大哥,求教啊WX:Xugeyuan923

houyuchao commented 1 month ago

完全按照hf dataset的格式写数据加载脚本,yaml的数据加载就只写数据那个文件夹就好了

大神您好可以分享一下怎么解决的吗