urchade / GLiNER

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
https://arxiv.org/abs/2311.08526
Apache License 2.0
1.26k stars 104 forks source link

RuntimeError: The input size 0, plus negative padding 0 and 0 resulted in a negative output size, which is invalid. Check dimension 1 of your input. #188

Open KameniAlexNea opened 1 week ago

KameniAlexNea commented 1 week ago

Hello folks,

I am trying to fine-tune GliNER on custom dataset (LMR/LMD) and after some steps, I encountered this issue :

File "~/gliner_finetuing.py", line 79, in <module>
    main()
  File "~/gliner_finetuing.py", line 75, in main
    trainer.train()
  File "~/.venv/lib/python3.9/site-packages/transformers/trainer.py", line 1938, in train
    return inner_training_loop(
  File "~/.venv/lib/python3.9/site-packages/transformers/trainer.py", line 2236, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):
  File "~/.venv/lib/python3.9/site-packages/accelerate/data_loader.py", line 568, in __iter__
    next_batch = next(dataloader_iter)
  File "~/.venv/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
  File "~/.venv/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 673, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "~/.venv/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
    return self.collate_fn(data)
  File "~/.venv/lib/python3.9/site-packages/gliner/data_processing/collator.py", line 31, in __call__
    model_input = self.data_processor.collate_fn(raw_batch, prepare_labels=self.prepare_labels)
  File "~/.venv/lib/python3.9/site-packages/gliner/data_processing/processor.py", line 203, in collate_fn
    model_input_batch = self.tokenize_and_prepare_labels(batch, prepare_labels, *args, **kwargs)
  File "~/.venv/lib/python3.9/site-packages/gliner/data_processing/processor.py", line 335, in tokenize_and_prepare_labels
    labels = self.create_labels(batch)
  File "~/.venv/lib/python3.9/site-packages/gliner/data_processing/processor.py", line 326, in create_labels
    labels_batch = pad_2d_tensor(labels_batch)
  File "~/.venv/lib/python3.9/site-packages/gliner/data_processing/utils.py", line 25, in pad_2d_tensor
    padded_tensor = torch.nn.functional.pad(tensor, (0, col_padding, 0, row_padding),
  File "~/.venv/lib/python3.9/site-packages/torch/nn/functional.py", line 4552, in pad
    return torch._C._nn.pad(input, pad, mode, value)
RuntimeError: The input size 0, plus negative padding 0 and 0 resulted in a negative output size, which is invalid. Check dimension 1 of your input.

Note that this issue occurred after around 2 epochs, it shouldn't be related to the data.

KameniAlexNea commented 6 days ago

This comment solved the issue:

https://github.com/urchade/GLiNER/issues/139#issuecomment-2206289412