kpe / bert-for-tf2

A Keras TensorFlow 2.0 implementation of BERT, ALBERT and adapter-BERT.
https://github.com/kpe/bert-for-tf2
MIT License
803 stars 193 forks source link

Max vocabulary issue? #59

Closed nectario closed 4 years ago

nectario commented 4 years ago

I am getting the below error and I am having a hard time trying to fix it. It seems the model exceeded the max vocabulary size? How do I avoid this from happening if I am using pretrained model?


Epoch 1/1000 2020-04-21 21:10:10.123738: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll 2020-04-21 21:10:10.535980: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Invalid argument: indices[99,4] = 45642 is not in [0, 30522) [[{{node model_1/TimeDistributedSegment/model/bert/embeddings/word_embeddings/embedding_lookup}}]] [[model_1/TimeDistributedSegment/model/bert/embeddings/word_embeddings/embedding_lookup/_28]] 2020-04-21 21:10:10.551579: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Invalid argument: indices[99,4] = 45642 is not in [0, 30522) [[{{node model_1/TimeDistributedSegment/model/bert/embeddings/word_embeddings/embedding_lookup}}]] 2020-04-21 21:10:11.269619: I tensorflow/core/profiler/lib/profiler_session.cc:225] Profiler session started. 2020-04-21 21:10:11.274666: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1259] Profiler found 1 GPUs 2020-04-21 21:10:11.287877: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cupti64_101.dll 1/96 [..............................] - ETA: 20:58WARNING:tensorflow:Early stopping conditioned on metric val_loss which is not available. Available metrics are: 2020-04-21 21:10:12.077116: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1378] CUPTI activity buffer flushed 2020-04-21 21:10:12.080944: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:88] GpuTracer has collected 211 callback api events and 211 activity events. Traceback (most recent call last): File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 767, in on_epoch yield epoch_logs File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 342, in fit total_epochs=epochs) File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 128, in run_one_epoch batch_outs = execution_function(iterator) File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 98, in execution_function distributed_function(input_fn)) File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 568, in call result = self._call(*args, *kwds) File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 632, in _call return self._stateless_fn(args, **kwds) File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\eager\function.py", line 2363, in call return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\eager\function.py", line 1611, in _filtered_call self.captured_inputs) File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\eager\function.py", line 1692, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\eager\function.py", line 545, in call ctx=ctx) File "C:\Program Files\Python37\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: indices[99,4] = 45642 is not in [0, 30522) [[node model_1/TimeDistributedSegment/model/bert/embeddings/word_embeddings/embedding_lookup (defined at C:\Program Files\Python37\lib\site-packages\bert\embeddings.py:208) ]] (1) Invalid argument: indices[99,4] = 45642 is not in [0, 30522) [[node model_1/TimeDistributedSegment/model/bert/embeddings/word_embeddings/embedding_lookup (defined at C:\Program Files\Python37\lib\site-packages\bert\embeddings.py:208) ]] [[model_1/TimeDistributedSegment/model/bert/embeddings/word_embeddings/embedding_lookup/_28]]

kpe commented 4 years ago

@nectario - check you input pipeline, and how it was generated, i.e. have you used the proper tokenizer, i.e. bert.bert_tokenization with bert and bert.albert_tokenization for an albert model.