urchade / GLiNER

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
https://arxiv.org/abs/2311.08526
Apache License 2.0
1.46k stars 126 forks source link

Fine Tuning not working #156

Closed gptob closed 4 months ago

gptob commented 4 months ago

Hello, I'm trying the code for a fine tuning task. Right now I am trying the exact example finetune.ipynb with sample_data.json but this error occurs even if I don't compile the model: 0%| | 0/24 [07:23<?, ?it/s] 0%| | 0/24 [00:00<?, ?it/s] { "name": "KeyError", "message": "Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File \"/Users/gianpaolotobia/Library/Python/3.9/lib/python/site-packages/torch/utils/data/_utils/worker.py\", line 308, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] File \"/Users/gianpaolotobia/Library/Python/3.9/lib/python/site-packages/torch/utils/data/_utils/fetch.py\", line 54, in fetch return self.collate_fn(data) File \"/Users/gianpaolotobia/Library/Python/3.9/lib/python/site-packages/transformers/trainer_utils.py\", line 809, in call return self.data_collator(features) File \"/Users/gianpaolotobia/Library/Python/3.9/lib/python/site-packages/gliner/data_processing/collator.py\", line 20, in call raw_batch = self.data_processor.collate_raw_batch(input_x) File \"/Users/gianpaolotobia/Library/Python/3.9/lib/python/site-packages/gliner/data_processing/processor.py\", line 171, in collate_raw_batch class_to_ids, id_to_classes = self.batch_generate_class_mappings(batch_list, negatives) File \"/Users/gianpaolotobia/Library/Python/3.9/lib/python/site-packages/gliner/data_processing/processor.py\", line 147, in batch_generate_class_mappings negatives = self.get_negatives(batch_list, 100) File \"/Users/gianpaolotobia/Library/Python/3.9/lib/python/site-packages/gliner/data_processing/processor.py\", line 74, in get_negatives types = set([el[-1] for el in b['ner']]) KeyError: 'ner' ", "stack": "--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[21], line 28 1 training_args = TrainingArguments( 2 output_dir=\"models\", 3 learning_rate=5e-6, (...) 17 report_to=\"none\", 18 ) 20 trainer = Trainer( 21 model=model, 22 args=training_args, (...) 26 data_collator=data_collator, 27 ) ---> 28 trainer.train()

File ~/Library/Python/3.9/lib/python/site-packages/transformers/trainer.py:1885, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs) 1883 hf_hub_utils.enable_progress_bars() 1884 else: -> 1885 return inner_training_loop( 1886 args=args, 1887 resume_from_checkpoint=resume_from_checkpoint, 1888 trial=trial, 1889 ignore_keys_for_eval=ignore_keys_for_eval, 1890 )

File ~/Library/Python/3.9/lib/python/site-packages/transformers/trainer.py:2178, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval) 2175 rng_to_sync = True 2177 step = -1 -> 2178 for step, inputs in enumerate(epoch_iterator): 2179 total_batched_samples += 1 2181 if self.args.include_num_input_tokens_seen:

File ~/Library/Python/3.9/lib/python/site-packages/accelerate/data_loader.py:454, in DataLoaderShard.iter(self) 452 # We iterate one batch ahead to check when we are at the end 453 try: --> 454 current_batch = next(dataloader_iter) 455 except StopIteration: 456 yield

File ~/Library/Python/3.9/lib/python/site-packages/torch/utils/data/dataloader.py:631, in _BaseDataLoaderIter.next(self) 628 if self._sampler_iter is None: 629 # TODO(https://github.com/pytorch/pytorch/issues/76750) 630 self._reset() # type: ignore[call-arg] --> 631 data = self._next_data() 632 self._num_yielded += 1 633 if self._dataset_kind == _DatasetKind.Iterable and \ 634 self._IterableDataset_len_called is not None and \ 635 self._num_yielded > self._IterableDataset_len_called:

File ~/Library/Python/3.9/lib/python/site-packages/torch/utils/data/dataloader.py:1346, in _MultiProcessingDataLoaderIter._next_data(self) 1344 else: 1345 del self._task_info[idx] -> 1346 return self._process_data(data)

File ~/Library/Python/3.9/lib/python/site-packages/torch/utils/data/dataloader.py:1372, in _MultiProcessingDataLoaderIter._process_data(self, data) 1370 self._try_put_index() 1371 if isinstance(data, ExceptionWrapper): -> 1372 data.reraise() 1373 return data

File ~/Library/Python/3.9/lib/python/site-packages/torch/_utils.py:705, in ExceptionWrapper.reraise(self) 701 except TypeError: 702 # If the exception takes multiple arguments, don't try to 703 # instantiate since we don't know how to 704 raise RuntimeError(msg) from None --> 705 raise exception

KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File \"/Users/gianpaolotobia/Library/Python/3.9/lib/python/site-packages/torch/utils/data/_utils/worker.py\", line 308, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] File \"/Users/gianpaolotobia/Library/Python/3.9/lib/python/site-packages/torch/utils/data/_utils/fetch.py\", line 54, in fetch return self.collate_fn(data) File \"/Users/gianpaolotobia/Library/Python/3.9/lib/python/site-packages/transformers/trainer_utils.py\", line 809, in call return self.data_collator(features) File \"/Users/gianpaolotobia/Library/Python/3.9/lib/python/site-packages/gliner/data_processing/collator.py\", line 20, in call raw_batch = self.data_processor.collate_raw_batch(input_x) File \"/Users/gianpaolotobia/Library/Python/3.9/lib/python/site-packages/gliner/data_processing/processor.py\", line 171, in collate_raw_batch class_to_ids, id_to_classes = self.batch_generate_class_mappings(batch_list, negatives) File \"/Users/gianpaolotobia/Library/Python/3.9/lib/python/site-packages/gliner/data_processing/processor.py\", line 147, in batch_generate_class_mappings negatives = self.get_negatives(batch_list, 100) File \"/Users/gianpaolotobia/Library/Python/3.9/lib/python/site-packages/gliner/data_processing/processor.py\", line 74, in get_negatives types = set([el[-1] for el in b['ner']]) KeyError: 'ner' " }

urchade commented 4 months ago

It looks like the data is not in the correct format

gptob commented 4 months ago

I used the example sample_data.json, is it not in the correct format?

urchade commented 4 months ago

ok, that is strange can you follow the steps in this link ? https://github.com/urchade/GLiNER/tree/training

it uses an earlier version of gliner

urchade commented 4 months ago

I have also made a colab notebook here: https://drive.google.com/file/d/1TEWQg5YPq6BZAHlQYfAbEKJJJEhxwcYO/view?usp=sharing

gptob commented 4 months ago

Now It's working