modelscope / AdaSeq

AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models
Apache License 2.0
414 stars 38 forks source link

[Question] 运行快速开始中的例子python scripts/train.py -c examples/bert_crf/configs/resume.yaml出现An error occurred while generating the dataset #47

Open Gsq6161 opened 1 month ago

Gsq6161 commented 1 month ago

What is your question?

我是一名刚开始学习的小白,本地部署adaseq,跟着仓库中的流程走的,在 except Exception as e:

Ignore the writer's error for no examples written to the file if this error was caused by the error in _generate_examples before the first example was yielded

        if isinstance(e, SchemaInferenceError) and e.__context__ is not None:
            e = e.__context__
        raise DatasetGenerationError("An error occurred while generating the dataset") from e执行不通了,该如何解决呢

What have you tried?

降低torch版本、datasets版本均不管用

Code (if necessary)

(adaseq) PS C:\Users\Acer\Desktop\AdaSeq-master> python scripts/train.py -c examples/bert_crf/configs/resume.yaml 2024-09-27 21:32:46,554 - modelscope - WARNING - The reference has been Deprecated in modelscope v1.4.0+, please use from modelscope.msdatasets.dataset_cls.custom_datasets import TorchCustomDataset 2024-09-27 21:32:47,201 - INFO - adaseq.data.dataset_manager - Will use a custom loading script: E:\Anaconda\envs\adaseq\lib\site-packages\adaseq\data\dataset_builders\named_entity_recognition_dataset_builder.py Downloading data: 135kB [00:00, 2.86MB/s] Downloading data: 1.09MB [00:00, 10.4MB/s] Downloading data: 120kB [00:00, 2.56MB/s] Generating test split: 0 examples [00:00, ? examples/s] Traceback (most recent call last): File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1739, in _prepare_split_single writer = writer_class( File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\arrow_writer.py", line 338, in init self.stream = self._fs.open(path, "wb") File "E:\Anaconda\envs\adaseq\lib\site-packages\fsspec\spec.py", line 1303, in open f = self._open( File "E:\Anaconda\envs\adaseq\lib\site-packages\fsspec\implementations\local.py", line 191, in _open return LocalFileOpener(path, mode, fs=self, **kwargs) File "E:\Anaconda\envs\adaseq\lib\site-packages\fsspec\implementations\local.py", line 355, in init self._open() File "E:\Anaconda\envs\adaseq\lib\site-packages\fsspec\implementations\local.py", line 360, in _open self.f = open(self.path, mode=self.mode) FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/Acer/.cache/huggingface/datasets/named_entity_recognition_dataset_builder/default-84b1c02799fb57ba/0.0.0/db737b9bb893f20fb03d04403a30bf7c033256c212b7e9f0ebc6e9c95 8535c51.incomplete/named_entity_recognition_dataset_builder-test-00000-00000-of-NNNNN.arrow'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\Users\Acer\Desktop\AdaSeq-master\scripts\train.py", line 39, in train_model_from_args(args) File "E:\Anaconda\envs\adaseq\lib\site-packages\adaseq\commands\train.py", line 84, in train_model_from_args train_model( File "E:\Anaconda\envs\adaseq\lib\site-packages\adaseq\commands\train.py", line 156, in train_model trainer = build_trainer_from_partial_objects( File "E:\Anaconda\envs\adaseq\lib\site-packages\adaseq\commands\train.py", line 185, in build_trainer_from_partial_objects dm = DatasetManager.from_config(task=config.task, config.dataset) File "E:\Anaconda\envs\adaseq\lib\site-packages\adaseq\data\dataset_manager.py", line 182, in from_config hfdataset = hf_load_dataset(path, name=name, kwargs) File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\load.py", line 2628, in load_dataset builder_instance.download_and_prepare( File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1029, in download_and_prepare self._download_and_prepare( File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1791, in _download_and_prepare super()._download_and_prepare( File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1124, in _download_and_prepare self._prepare_split(split_generator, **prepare_split_kwargs) File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1629, in _prepare_split for job_id, done, content in self._prepare_split_single( File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1786, in _prepare_split_single raise DatasetGenerationError("An error occurred while generating the dataset") from e datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset

What's your environment?

Code of Conduct

lengyanglph commented 4 days ago

我也是这个问题,'C:/Users/Acer/.cache/huggingface/datasets/named_entity_recognition_dataset_builder/default-84b1c02799fb57ba/0.0.0/db737b9bb893f20fb03d04403a30bf7c033256c212b7e9f0ebc6e9c95 8535c51.incomplete/named_entity_recognition_dataset_builder-test-00000-00000-of-NNNNN.arrow'是本地缓存,incomplete标记表示缓存文件还没有生成,读取这个不存在文件就报错了……