dataset_info.json文件不存在

Guo-Chenxu commented 11 months ago

您好, 我在复现代码的时候又遇到了一些问题, 报错显示 dataset_info.json 这个文件不存在, 但是我似乎并没有找到哪个地方会生成或者可以获取到这个文件

Training Alpaca-LoRA model with params: base_model: models/alpaca-7b-wdiff data_path: data/UMLS-train.json output_dir: out/finetune_kopa batch_size: 12 micro_batch_size: 12 num_epochs: 3 learning_rate: 0.0003 cutoff_len: 512 val_set_size: 0 lora_r: 32 num_prefix: 1 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj'] train_on_inputs: True add_eos_token: False group_by_length: False wandb_project: wandb_run_name: wandb_watch: wandb_log_model: resume_from_checkpoint: False prompt template: alpaca kge model: data/UMLS-rotate.pth

Loading checkpoint shards: 100%|██████████| 3/3 [00:38<00:00, 12.76s/it] /home/guochenxu/pythonProjects/KoPA-main/process_kge.py:8: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). ent_embs = torch.tensor(kge_model["ent_embeddings.weight"]).cpu() /home/guochenxu/pythonProjects/KoPA-main/process_kge.py:9: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). rel_embs = torch.tensor(kge_model["rel_embeddings.weight"]).cpu() 1024 512 Adapter Trained From Scratch Traceback (most recent call last): File "/home/guochenxu/pythonProjects/KoPA-main/finetune_kopa.py", line 279, in fire.Fire(train) File "/home/guochenxu/anaconda310/lib/python3.10/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/guochenxu/anaconda310/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/guochenxu/anaconda310/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, kwargs) File "/home/guochenxu/pythonProjects/KoPA-main/finetune_kopa.py", line 182, in train data = load_dataset("json") File "/home/guochenxu/anaconda310/lib/python3.10/site-packages/datasets/load.py", line 1759, in load_dataset builder_instance = load_dataset_builder( File "/home/guochenxu/anaconda310/lib/python3.10/site-packages/datasets/load.py", line 1522, in load_dataset_builder builder_instance: DatasetBuilder = builder_cls( File "/home/guochenxu/anaconda310/lib/python3.10/site-packages/datasets/builder.py", line 363, in init self.info = DatasetInfo.from_directory(self._cache_dir) File "/home/guochenxu/anaconda310/lib/python3.10/site-packages/datasets/info.py", line 358, in from_directory with fs.open(path_join(dataset_info_dir, config.DATASET_INFO_FILENAME), "r", encoding="utf-8") as f: File "/home/guochenxu/anaconda310/lib/python3.10/site-packages/fsspec/spec.py", line 1295, in open self.open( File "/home/guochenxu/anaconda310/lib/python3.10/site-packages/fsspec/spec.py", line 1307, in open f = self._open( File "/home/guochenxu/anaconda310/lib/python3.10/site-packages/fsspec/implementations/local.py", line 180, in _open return LocalFileOpener(path, mode, fs=self, kwargs) File "/home/guochenxu/anaconda310/lib/python3.10/site-packages/fsspec/implementations/local.py", line 302, in init self._open() File "/home/guochenxu/anaconda310/lib/python3.10/site-packages/fsspec/implementations/local.py", line 307, in _open self.f = open(self.path, mode=self.mode) FileNotFoundError: [Errno 2] No such file or directory: '/home/guochenxu/.cache/huggingface/datasets/json/default-ae25584a5d8560de/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/dataset_info.json'

我的模型是根据 stanford_alpaca 的说明生成的模型:

执行的命令如下:

export WANDB_DISABLED=true
wandb offline
CUDA_VISIBLE_DEVICES=0 nohup python finetune_kopa.py \
    --base_model 'models/alpaca-7b' \
    --data_path 'data/UMLS-train.json' \
    --output_dir 'out/finetune_kopa' \
    --num_epochs 3 \
    --lora_r 32 \
    --learning_rate 3e-4 \
    --batch_size 12 \
    --micro_batch_size 12 \
    --num_prefix 1 \
    --kge_model 'data/UMLS-rotate.pth' \
    --lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' > log.txt &

Zhang-Each commented 11 months ago

你好，我们在实验的过程中也没有遇到过这类报错，我们的代码里应该是不存在名为dataset_info.json的文件的。可能是datasets库和transformers库版本不同导致的问题，我们在实验过程中使用的datasets版本是2.10.1，transforemrs版本是4.28.0

Guo-Chenxu commented 11 months ago

你好，我们在实验的过程中也没有遇到过这类报错，我们的代码里应该是不存在名为dataset_info.json的文件的。可能是datasets库和transformers库版本不同导致的问题，我们在实验过程中使用的datasets版本是2.10.1，transforemrs版本是4.28.0

库的版本似乎也没有问题 /(ㄒoㄒ)/~~

Zhang-Each commented 11 months ago

抱歉，我们也没有在实验过程中遇到过类似问题。后续如果我们发现了解决的办法会第一时间告诉你

seienn commented 10 months ago

我也遇到了这个问题，请问解决了吗

zjukg / KoPA

dataset_info.json文件不存在 #10