Closed randomgambit closed 2 years ago
For the first issue you are training from scratch a new model versus fine-tuning one that has been pretrained on way more data. It's completely normal that the latter wins. As for the second one, I'm not sure you can directly use the tokenizer.pad method as a collation function.
Note that since you are copying the error messages, you should expand the intermediate frames so we can see where the error comes from.
thanks @sgugger could you please clarify what you mean by
As for the second one, I'm not sure you can directly use the tokenizer.pad method as a collation function.
The call
tf_train_dataset = encoded_dataset["train"].to_tf_dataset(
columns=tokenizer_columns,
label_cols=["label"],
shuffle=True,
batch_size=16,
collate_fn=mytokenizer.pad,
comes directly from the official tf2 notebook https://github.com/huggingface/notebooks/blob/new_tf_notebooks/examples/text_classification-tf.ipynb
expanded error here, thanks!
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-d01ad7112f932f9c.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-de5efda680a1f856.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-0f3c1e00b7f03ba8.arrow
Sentence: hide new secretions from the parental units
{'input_ids': [[2, 11384, 1363, 3215, 1325, 1218, 1125, 10341, 1139, 3464, 3], [2, 4023, 1491, 15755, 16, 1520, 4610, 1128, 13221, 802, 3], [2, 1187, 13755, 1327, 2845, 1142, 18920, 802, 4245, 3168, 7806, 1542, 2569, 3796, 3], [2, 3419, 22353, 13782, 1145, 3802, 1125, 1913, 2493, 3], [2, 1161, 1125, 6802, 11823, 17, 1137, 17, 1125, 17, 1233, 3765, 802, 1305, 18029, 802, 1125, 21157, 1843, 14645, 1280, 1427, 3]], 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]}
Columns added by tokenizer: ['attention_mask', 'input_ids', 'token_type_ids']
ClassLabel(num_classes=2, names=['negative', 'positive'], names_file=None, id=None)
---------------------------------------------------------------------------
VisibleDeprecationWarning Traceback (most recent call last)
<ipython-input-56-ddb32272e3ba> in <module>()
47 shuffle=True,
48 batch_size=16,
---> 49 collate_fn=mytokenizer.pad,
50 )
51 tf_validation_dataset = encoded_dataset[validation_key].to_tf_dataset(
9 frames
/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py in to_tf_dataset(self, columns, batch_size, shuffle, drop_remainder, collate_fn, collate_fn_args, label_cols, dummy_labels, prefetch)
349 return [tf.convert_to_tensor(arr) for arr in out_batch]
350
--> 351 test_batch = np_get_batch(np.arange(batch_size))
352
353 @tf.function(input_signature=[tf.TensorSpec(None, tf.int64)])
/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py in np_get_batch(indices)
323
324 def np_get_batch(indices):
--> 325 batch = dataset[indices]
326 out_batch = []
327 if collate_fn is not None:
/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py in __getitem__(self, key)
1780 format_columns=self._format_columns,
1781 output_all_columns=self._output_all_columns,
-> 1782 format_kwargs=self._format_kwargs,
1783 )
1784
/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py in _getitem(self, key, format_type, format_columns, output_all_columns, format_kwargs)
1769 pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
1770 formatted_output = format_table(
-> 1771 pa_subtable, key, formatter=formatter, format_columns=format_columns, output_all_columns=output_all_columns
1772 )
1773 return formatted_output
/usr/local/lib/python3.7/dist-packages/datasets/formatting/formatting.py in format_table(table, key, formatter, format_columns, output_all_columns)
420 else:
421 pa_table_to_format = pa_table.drop(col for col in pa_table.column_names if col not in format_columns)
--> 422 formatted_output = formatter(pa_table_to_format, query_type=query_type)
423 if output_all_columns:
424 if isinstance(formatted_output, MutableMapping):
/usr/local/lib/python3.7/dist-packages/datasets/formatting/formatting.py in __call__(self, pa_table, query_type)
196 return self.format_column(pa_table)
197 elif query_type == "batch":
--> 198 return self.format_batch(pa_table)
199
200 def format_row(self, pa_table: pa.Table) -> RowFormat:
/usr/local/lib/python3.7/dist-packages/datasets/formatting/formatting.py in format_batch(self, pa_table)
241
242 def format_batch(self, pa_table: pa.Table) -> dict:
--> 243 return self.numpy_arrow_extractor(**self.np_array_kwargs).extract_batch(pa_table)
244
245
/usr/local/lib/python3.7/dist-packages/datasets/formatting/formatting.py in extract_batch(self, pa_table)
152
153 def extract_batch(self, pa_table: pa.Table) -> dict:
--> 154 return {col: self._arrow_array_to_numpy(pa_table[col]) for col in pa_table.column_names}
155
156 def _arrow_array_to_numpy(self, pa_array: pa.Array) -> np.ndarray:
/usr/local/lib/python3.7/dist-packages/datasets/formatting/formatting.py in <dictcomp>(.0)
152
153 def extract_batch(self, pa_table: pa.Table) -> dict:
--> 154 return {col: self._arrow_array_to_numpy(pa_table[col]) for col in pa_table.column_names}
155
156 def _arrow_array_to_numpy(self, pa_array: pa.Array) -> np.ndarray:
/usr/local/lib/python3.7/dist-packages/datasets/formatting/formatting.py in _arrow_array_to_numpy(self, pa_array)
165 # cast to list of arrays or we end up with a np.array with dtype object
166 array: List[np.ndarray] = pa_array.to_numpy(zero_copy_only=zero_copy_only).tolist()
--> 167 return np.array(array, copy=False, **self.np_array_kwargs)
168
169
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
I'm sure @Rocketknight1 will know what's going on here :-)
waiting for @Rocketknight1 then! Thanks
@Rocketknight1 @sgugger interestingly running the same notebook today (with the new pip install that is) returns another error
Not sure what the issue is this time... Any ideas? Thanks!
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Sentence: hide new secretions from the parental units
{'input_ids': [[2, 11384, 1363, 3215, 1325, 1218, 1125, 10341, 1139, 3464, 3], [2, 4023, 1491, 15755, 16, 1520, 4610, 1128, 13221, 798, 3], [2, 1187, 13755, 1327, 2845, 1142, 18920, 798, 4245, 3168, 7806, 1542, 2569, 3796, 3], [2, 3419, 22351, 13782, 1145, 3802, 1125, 1913, 2493, 3], [2, 1161, 1125, 6802, 11823, 17, 1137, 17, 1125, 17, 1233, 3765, 798, 1305, 18030, 798, 1125, 21156, 1843, 14645, 1280, 1427, 3]], 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]}
100%
68/68 [00:04<00:00, 20.16ba/s]
100%
1/1 [00:00<00:00, 10.70ba/s]
100%
2/2 [00:00<00:00, 13.42ba/s]
Columns added by tokenizer: ['token_type_ids', 'input_ids', 'attention_mask']
ClassLabel(num_classes=2, names=['negative', 'positive'], names_file=None, id=None)
/usr/local/lib/python3.7/dist-packages/datasets/formatting/formatting.py:167: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
return np.array(array, copy=False, **self.np_array_kwargs)
404 Client Error: Not Found for url: https://huggingface.co/%3Ctransformers.models.bert.modeling_tf_bert.TFBertForMaskedLM%20object%20at%200x7f1f29039850%3E/resolve/main/config.json
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
553 use_auth_token=use_auth_token,
--> 554 user_agent=user_agent,
555 )
6 frames
/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, use_auth_token, local_files_only)
1409 use_auth_token=use_auth_token,
-> 1410 local_files_only=local_files_only,
1411 )
/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, use_auth_token, local_files_only)
1573 r = requests.head(url, headers=headers, allow_redirects=False, proxies=proxies, timeout=etag_timeout)
-> 1574 r.raise_for_status()
1575 etag = r.headers.get("X-Linked-Etag") or r.headers.get("ETag")
/usr/local/lib/python3.7/dist-packages/requests/models.py in raise_for_status(self)
940 if http_error_msg:
--> 941 raise HTTPError(http_error_msg, response=self)
942
HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/%3Ctransformers.models.bert.modeling_tf_bert.TFBertForMaskedLM%20object%20at%200x7f1f29039850%3E/resolve/main/config.json
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
<ipython-input-6-ddb32272e3ba> in <module>()
73
74 model = TFAutoModelForSequenceClassification.from_pretrained(
---> 75 model, num_labels=num_labels
76 )
77
/usr/local/lib/python3.7/dist-packages/transformers/models/auto/auto_factory.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
395 if not isinstance(config, PretrainedConfig):
396 config, kwargs = AutoConfig.from_pretrained(
--> 397 pretrained_model_name_or_path, return_unused_kwargs=True, **kwargs
398 )
399 if hasattr(config, "auto_map") and cls.__name__ in config.auto_map:
/usr/local/lib/python3.7/dist-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
525 """
526 kwargs["_from_auto"] = True
--> 527 config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
528 if "model_type" in config_dict:
529 config_class = CONFIG_MAPPING[config_dict["model_type"]]
/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
568 msg += f"- or '{revision}' is a valid git identifier (branch name, a tag name, or a commit id) that exists for this model name as listed on its model page on 'https://huggingface.co/models'\n\n"
569
--> 570 raise EnvironmentError(msg)
571
572 except json.JSONDecodeError:
OSError: Can't load config for '<transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM object at 0x7f1f29039850>'. Make sure that:
- '<transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM object at 0x7f1f29039850>' is a correct model identifier listed on 'https://huggingface.co/models'
- or '<transformers.models.bert.modeling_tf_bert.TFBertForMaskedLM object at 0x7f1f29039850>' is the correct path to a directory containing a config.json file
Hi @randomgambit, sorry for the lengthy delay in replying again! I'm still making changes to some of the lower-level parts of the library, so these notebooks haven't been fully finalized yet.
The VisibleDeprecationWarning
in your first post is something that will hopefully be fixed by upcoming changes to datasets
, but for now you can just ignore it.
The error you're getting in your final post is, I think, caused by you overwriting the variable model
in your code. The from_pretrained()
method expects a string like bert-base-cased
, but it seems like you've created an actual TF model with that variable name. If you pass an actual model object to from_pretrained()
it'll get very confused - so make sure that whatever argument you're passing there is a string and not something else!
thanks @Rocketknight1, super useful as usual. So what you are saying is that I should have saved my tokenizer mytokenizer
and my language model model
using save_pretrained()
, and then I need to load the model with a classification head using TFAutoModelForSequenceClassification
, right?
model.save_pretrained('mymodel')
mytokenizer.save_pretrained('mytokenizer')
model = TFAutoModelForSequenceClassification.from_pretrained(
'mymodel', num_labels=num_labels
)
This seems to work. I will try to adapt the code so that both the tokenization and the language model are performed on the dataset actually used in the classidication task (dataset = load_dataset("glue", "sst2")
. Do you mind having a look when i'm done? This will be a super useful notebook for everyone.
Thanks!
@Rocketknight1 @sgugger I can confirm the new TF notebook works beautifully! Thanks! Just a follow up though: I tried to fine-tune a longformer
model and everything works smoothly until the model.fit
call, where I get a cryptic message
This is the model I use:
task = "sst2"
model_checkpoint = "allenai/longformer-large-4096"
batch_size = 16
and then you can run the default notebook https://github.com/huggingface/notebooks/blob/master/examples/text_classification-tf.ipynb until you reach the end
model.fit(
tf_train_dataset,
validation_data=tf_validation_dataset,
epochs=3)
Epoch 1/3
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-28-4075d9d9fb81> in <module>()
3 tf_train_dataset,
4 validation_data=tf_validation_dataset,
----> 5 epochs=3)
9 frames
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1182 _r=1):
1183 callbacks.on_train_batch_begin(step)
-> 1184 tmp_logs = self.train_function(iterator)
1185 if data_handler.should_sync:
1186 context.async_wait()
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
883
884 with OptionalXlaContext(self._jit_compile):
--> 885 result = self._call(*args, **kwds)
886
887 new_tracing_count = self.experimental_get_tracing_count()
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
922 # In this case we have not created variables on the first call. So we can
923 # run the first trace but we should fail if variables are created.
--> 924 results = self._stateful_fn(*args, **kwds)
925 if self._created_variables and not ALLOW_DYNAMIC_VARIABLE_CREATION:
926 raise ValueError("Creating variables on a non-first call to a function"
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)
3036 with self._lock:
3037 (graph_function,
-> 3038 filtered_flat_args) = self._maybe_define_function(args, kwargs)
3039 return graph_function._call_flat(
3040 filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
3458 call_context_key in self._function_cache.missed):
3459 return self._define_function_with_shape_relaxation(
-> 3460 args, kwargs, flat_args, filtered_flat_args, cache_key_context)
3461
3462 self._function_cache.missed.add(call_context_key)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _define_function_with_shape_relaxation(self, args, kwargs, flat_args, filtered_flat_args, cache_key_context)
3380
3381 graph_function = self._create_graph_function(
-> 3382 args, kwargs, override_flat_arg_shapes=relaxed_arg_shapes)
3383 self._function_cache.arg_relaxed[rank_only_cache_key] = graph_function
3384
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
3306 arg_names=arg_names,
3307 override_flat_arg_shapes=override_flat_arg_shapes,
-> 3308 capture_by_value=self._capture_by_value),
3309 self._function_attributes,
3310 function_spec=self.function_spec,
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes, acd_record_initial_resource_uses)
1005 _, original_func = tf_decorator.unwrap(python_func)
1006
-> 1007 func_outputs = python_func(*func_args, **func_kwargs)
1008
1009 # invariant: `func_outputs` contains only Tensors, CompositeTensors,
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
666 # the function a weak reference to itself to avoid a reference cycle.
667 with OptionalXlaContext(compile_with_xla):
--> 668 out = weak_wrapped_fn().__wrapped__(*args, **kwds)
669 return out
670
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
992 except Exception as e: # pylint:disable=broad-except
993 if hasattr(e, "ag_error_metadata"):
--> 994 raise e.ag_error_metadata.to_exception(e)
995 else:
996 raise
TypeError: in user code:
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py:853 train_function *
return step_function(self, iterator)
/usr/local/lib/python3.7/dist-packages/transformers/models/longformer/modeling_tf_longformer.py:2408 call *
inputs["global_attention_mask"] = tf.tensor_scatter_nd_update(
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:206 wrapper **
return target(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/array_ops.py:5755 tensor_scatter_nd_update
tensor=tensor, indices=indices, updates=updates, name=name)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/gen_array_ops.py:11311 tensor_scatter_update
updates=updates, name=name)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/op_def_library.py:558 _apply_op_helper
inferred_from[input_arg.type_attr]))
TypeError: Input 'updates' of 'TensorScatterUpdate' Op has type int32 that does not match type int64 of argument 'tensor'.
Maybe there is something specific to longformer
that does not work well with the current notebook? What do you all think?
Thanks!
@Rocketknight1 I know you are busy (and I cannot thank you enough for the magnificent TF notebooks!) but I wanted to let you know that I also have tried with allenai/longformer-base-4096
and I am getting the same int64
error. Please let me know if I can do anything to help you out.
Thanks!
Hi @Rocketknight1 I hope all is well!
I know wonder if longformer
can be trained at all with this notebook. Indeed, I read that
This notebook is built to run on any of the tasks in the list above, with any model checkpoint from the Model Hub as long as that model has a version with a classification head.
If so, could you please tell me which TF notebook I need to adapt to make it work? Thanks!!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Have you found any solution @randomgambit? Running into this myself.
i'll try passing in zeros cast to int32
to the global_attention_mask
param to fit
and see if that helps. the tf.zeros_like
used by transformers
to generate the mask (when none are passed in by the user) must default to int64
?
@randomgambit try the opposite of what I said above. You need to cast your input_ids
to tf.int32
. something like this should work:
input_ids = tf.convert_to_tensor([tf.convert_to_tensor(row, dtype=tf.int32)
for row in input_ids], dtype=tf.int32)
it would probably work via equivalent numpy
methods, but I haven't tried that yet. the default dtype for tf.zeros_like
is tf.int32
(transformers makes global_attention_mask
using tf.zeros_like
for you if you don't pass it in).
you could probably also create the global_attention_mask
yourself as dtype tf.int64
. point being i think they all just need to be the same type.
we can probably close this @Rocketknight1
thanks @jmwoloso, I initially didn't see your message. I am hoping @Rocketknight1 can just confirm all is good before closing... Thanks!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Ran into the same problem. I am totally lost.
Here is what I did
`import numpy as np my_dict = {'text': ["random text 1", "random text 2", "random text 3"], 'label': [0, 0, 1]}
from datasets import Dataset
dataset = Dataset.from_dict(my_dict)`
` from transformers import LongformerTokenizer, TFLongformerForSequenceClassification tokenizer = LongformerTokenizer.from_pretrained('allenai/longformer-base-4096')
def tokenize_function(examples):
r=tokenizer(examples["text"], padding="max_length", truncation=True)
r['input_ids']= [tf.convert_to_tensor(row, dtype=tf.int32)
for row in r['input_ids']]
r['attention_mask']= [tf.convert_to_tensor(row, dtype=tf.int32)
for row in r['attention_mask']]
return r
tokenized_datasets = dataset.map(tokenize_function, batched=True)
small_train_dataset = tokenized_datasets.shuffle(seed=42)
from transformers import DefaultDataCollator
data_collator = DefaultDataCollator(return_tensors="tf")
tf_train_dataset = small_train_dataset.to_tf_dataset( columns=["attention_mask", "input_ids", "token_type_ids"], label_cols=["labels"], shuffle=True, collate_fn=data_collator, batch_size=8, )
model.fit(tf_train_dataset, batch_size=1)
`
@randomgambit and @jmwoloso any ideas?
@ichenjia There were a few errors mentioned throughout this thread. Which one are you seeing?
Thank you. It’s the last error related to int32 and int64
On Sat, Sep 3, 2022 at 11:20 PM Jason Wolosonovich @.***> wrote:
@ichenjia https://github.com/ichenjia There were a few errors mentioned throughout this thread. Which one are you seeing?
— Reply to this email directly, view it on GitHub https://github.com/huggingface/transformers/issues/13632#issuecomment-1236269062, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4MCGZ7QRFZSF7GSGCBS3DV4Q5TVANCNFSM5EIHFCCA . You are receiving this because you were mentioned.Message ID: @.***>
@ichenjia Did you try my solution of casting your input_ids
to tf.int32
?
@ichenjia Did you try my solution of casting your
input_ids
totf.int32
?
Thank you. Here is what I did per the earlier tip from this thread
r['input_ids']= [tf.convert_to_tensor(row, dtype=tf.int32) for row in r['input_ids']] r['attention_mask']= [tf.convert_to_tensor(row, dtype=tf.int32)
In the tokenizer function mapped to dataset I still got that int32 error. Did I do something wrong?
@jmwoloso
After reading the source code of Dataset, I think the problem is in the to_tf_dataset function, which called
_get_output_signature
LN 290-303
if np.issubdtype(np_arrays[0].dtype, np.integer) or np_arrays[0].dtype == bool:
tf_dtype = tf.int64
np_dtype = np.int64
elif np.issubdtype(np_arrays[0].dtype, np.number):
tf_dtype = tf.float32
np_dtype = np.float32
elif np_arrays[0].dtype.kind == "U": # Unicode strings
np_dtype = np.unicode_
tf_dtype = tf.string
else:
raise RuntimeError(
f"Unrecognized array dtype {np_arrays[0].dtype}. \n"
"Nested types and image/audio types are not supported yet."
)
It forces a tf.int64 instead of tf.int32. It doesn't look like we have any control over it outside the API
There are always more layers, it seems @ichenjia :) I think we definitely have some control, or at least a way to hack it to prove the theory (thanks Python!). Could you try something like below as a temporary work around to see if it solves it?
I haven't looked at the source extensively, but maybe as a permanent fix we could add some dtype checking in _get_output_signature
of the dataset in order to preserve what is passed in, but I'd defer to the HF crew on what, if anything, could/should be done assuming this hack works.
But until then, maybe this will help. We can try overriding that private method. (Also, to get the markdown formatting to show as a script, enclose your code with 3 backticks instead of 1).
*Edit was to fix formatting
import types
import numpy as np
def _get_output_signature(
dataset: "Dataset",
collate_fn: Callable,
collate_fn_args: dict,
cols_to_retain: Optional[List[str]] = None,
batch_size: Optional[int] = None,
num_test_batches: int = 10,
):
"""Private method used by `to_tf_dataset()` to find the shapes and dtypes of samples from this dataset
after being passed through the collate_fn. Tensorflow needs an exact signature for tf.numpy_function, so
the only way to do this is to run test batches - the collator may add or rename columns, so we can't figure
it out just by inspecting the dataset.
Args:
dataset (:obj:`Dataset`): Dataset to load samples from.
collate_fn(:obj:`bool`): Shuffle the dataset order when loading. Recommended True for training, False for
validation/evaluation.
collate_fn(:obj:`Callable`): A function or callable object (such as a `DataCollator`) that will collate
lists of samples into a batch.
collate_fn_args (:obj:`Dict`): A `dict` of keyword arguments to be passed to the
`collate_fn`.
batch_size (:obj:`int`, optional): The size of batches loaded from the dataset. Used for shape inference.
Can be None, which indicates that batch sizes can be variable.
Returns:
:obj:`dict`: Dict mapping column names to tf.Tensorspec objects
:obj:`dict`: Dict mapping column names to np.dtype objects
"""
if config.TF_AVAILABLE:
import tensorflow as tf
else:
raise ImportError("Called a Tensorflow-specific function but Tensorflow is not installed.")
if len(dataset) == 0:
raise ValueError("Unable to get the output signature because the dataset is empty.")
if batch_size is None:
test_batch_size = min(len(dataset), 8)
else:
batch_size = min(len(dataset), batch_size)
test_batch_size = batch_size
test_batches = []
for _ in range(num_test_batches):
indices = sample(range(len(dataset)), test_batch_size)
test_batch = dataset[indices]
if cols_to_retain is not None:
test_batch = {
key: value
for key, value in test_batch.items()
if key in cols_to_retain or key in ("label_ids", "label")
}
test_batch = [{key: value[i] for key, value in test_batch.items()} for i in range(test_batch_size)]
test_batch = collate_fn(test_batch, **collate_fn_args)
test_batches.append(test_batch)
tf_columns_to_signatures = {}
np_columns_to_dtypes = {}
for column in test_batches[0].keys():
raw_arrays = [batch[column] for batch in test_batches]
# In case the collate_fn returns something strange
np_arrays = []
for array in raw_arrays:
if isinstance(array, np.ndarray):
np_arrays.append(array)
elif isinstance(array, tf.Tensor):
np_arrays.append(array.numpy())
else:
np_arrays.append(np.array(array))
if np.issubdtype(np_arrays[0].dtype, np.integer) or np_arrays[0].dtype == bool:
tf_dtype = tf.int32 # formerly tf.int64
np_dtype = np.int32 # formerly tf.int64
elif np.issubdtype(np_arrays[0].dtype, np.number):
tf_dtype = tf.float32
np_dtype = np.float32
elif np_arrays[0].dtype.kind == "U": # Unicode strings
np_dtype = np.unicode_
tf_dtype = tf.string
else:
raise RuntimeError(
f"Unrecognized array dtype {np_arrays[0].dtype}. \n"
"Nested types and image/audio types are not supported yet."
)
shapes = [array.shape for array in np_arrays]
static_shape = []
for dim in range(len(shapes[0])):
sizes = set([shape[dim] for shape in shapes])
if dim == 0:
static_shape.append(batch_size)
continue
if len(sizes) == 1: # This dimension looks constant
static_shape.append(sizes.pop())
else: # Use None for variable dimensions
static_shape.append(None)
tf_columns_to_signatures[column] = tf.TensorSpec(shape=static_shape, dtype=tf_dtype)
np_columns_to_dtypes[column] = np_dtype
return tf_columns_to_signatures, np_columns_to_dtypes
my_dict = {'text': ["random text 1", "random text 2", "random text 3"],
'label': [0, 0, 1]}
from datasets import Dataset
dataset = Dataset.from_dict(my_dict)
from transformers import LongformerTokenizer, TFLongformerForSequenceClassification
tokenizer = LongformerTokenizer.from_pretrained('allenai/longformer-base-4096')
def tokenize_function(examples):
r=tokenizer(examples["text"], padding="max_length", truncation=True)
r['input_ids']= [tf.convert_to_tensor(row, dtype=tf.int32)
for row in r['input_ids']]
r['attention_mask']= [tf.convert_to_tensor(row, dtype=tf.int32)
for row in r['attention_mask']]
return r
tokenized_datasets = dataset.map(tokenize_function, batched=True)
small_train_dataset = tokenized_datasets.shuffle(seed=42)
from transformers import DefaultDataCollator
data_collator = DefaultDataCollator(return_tensors="tf")
# override our instance method
tf_train_dataset._get_output_signature = types.MethodType(_get_output_signature, tf_train_dataset)
tf_train_dataset = small_train_dataset.to_tf_dataset(
columns=["attention_mask", "input_ids", "token_type_ids"],
label_cols=["labels"],
shuffle=True,
collate_fn=data_collator,
batch_size=8,
)
model.fit(tf_train_dataset, batch_size=1)
Hi @jmwoloso @ichenjia, sorry for only seeing this now! Just to clarify, are you encountering difficulties passing tf.int64
values to TFLongFormer
? You're correct that the to_tf_dataset
and prepare_tf_dataset
methods cast all int outputs to tf.int64
, but this is because our policy is that our models should always accept tf.int64
for any integer tensor inputs. If you're encountering issues with that, it's more likely a bug in LongFormer than in to_tf_dataset
!
Hi @Rocketknight1 thanks for the reply. That all makes sense. This thread has kind of morphed, but I believe you solved the original issue which dealt with trying to pass ragged tensors to the model.
The next issue that came up from that was that the TensorScatterUpdate
op in TF expects tf.int32
inputs (according to the traceback) but was getting tf.int64
. That originates in the modeling_tf_longformer.py
module when the global_attention_mask
is created.
I can take a look and see if there is anything to be done in that longformer file, but this seems like a lower-level TF op issue to me. But you are the TF scape-GOAT around here, so I'll defer to your guidance/wisdom :)
Hi @jmwoloso, the code for TFLongformer was indeed using lots of tf.int32
, which it shouldn't. Our tests weren't picking that up for some reason - I'll have to investigate that later. For now, can you try the PR and let me know if it fixes your issues? You can install from the PR branch with pip install --upgrade git+https://github.com/huggingface/transformers.git@fix_tflongformer_int_dtype
Thanks @Rocketknight1! @ichenjia see if that solves your issue.
Hi @jmwoloso, the code for TFLongformer was indeed using lots of
tf.int32
, which it shouldn't. Our tests weren't picking that up for some reason - I'll have to investigate that later. For now, can you try the PR and let me know if it fixes your issues? You can install from the PR branch withpip install --upgrade git+https://github.com/huggingface/transformers.git@fix_tflongformer_int_dtype
Thank you @Rocketknight1 and @jmwoloso for the clear explanation and your check-in does solve the int32 issue. However, I think the check-in may have brought int another issue.
My understanding is that the global_attention_mask is calculated at run-time instead of being provided, which is also marked as Optional in the API.
So when I call
model.fit(tf_train_dataset, batch_size=1)
The following line was called:
longformer/modeling_tf_longformer.py:2391 call * global_attention_mask = tf.cast(global_attention_mask, tf.int64)
and the following error occurred
`python3.8/site-packages/tensorflow/python/framework/tensor_util.py:445 make_tensor_proto raise ValueError("None values not supported.")
ValueError: None values not supported.`
I am guessing global_attention_mask was forcefully cast even though None was provided.
Is that correct understanding?
@ichenjia can you try explicitly passing in the global_attention_mask
? I believe it ends up just being constructed on the fly with tf.zeroes_like
method so maybe you could try that to get you unstuck?
@ichenjia can you try explicitly passing in the
global_attention_mask
? I believe it ends up just being constructed on the fly withtf.zeroes_like
method so maybe you could try that to get you unstuck?
Thank you @jmwoloso
I manually created a global attention mask in the tokenizer function:
from transformers import LongformerTokenizer, TFLongformerForSequenceClassification
import tensorflow as tf
import pickle
import numpy as np
from transformers import DefaultDataCollator
tf.data.experimental.enable_debug_mode()
#tf.config.experimental_run_functions_eagerly(True)
tf.config.run_functions_eagerly(True)
import numpy as np
my_dict = {'text': ["random text 1", "randome text 2", "beautiful randome text 3"],
'label': [0,0,1]}
from datasets import Dataset
dataset = Dataset.from_dict(my_dict)
def tokenize_function(examples):
r=tokenizer(examples["text"], padding="max_length", truncation=True)
global_attention_masks=[[1]*len(r['attention_mask'][0])]*len(r['attention_mask'])
r['global_attention_mask']=global_attention_masks
return r
tokenized_datasets = dataset.map(tokenize_function, batched=True)
data_collator = DefaultDataCollator(return_tensors="tf")
tf_train_dataset = tokenized_datasets.to_tf_dataset(
columns=["attention_mask", "input_ids", "token_type_ids", 'global_attention_mask'],
label_cols=["labels"],
shuffle=True,
collate_fn=data_collator,
batch_size=1
)
tf.data.experimental.enable_debug_mode()
tf.config.experimental_run_functions_eagerly(True)
model = TFLongformerForSequenceClassification.from_pretrained('allenai/longformer-base-4096', num_labels=2)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=tf.metrics.SparseCategoricalAccuracy(),
)
model.fit(tf_train_dataset, batch_size=1)
It immediately produced an OOM error
ResourceExhaustedError: OOM when allocating tensor with shape[12,16,196864] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:StridedSlice] name: tf_longformer_for_sequence_classification/longformer/encoder/layer_._5/attention/self/strided_slice/
I have a Titan RTX with 24GB of VRAM on that GPU. How much RAM does this need? Am I doing something wrong with again?
ahhh...Longformer
is pretty chunky, that's for sure. Have you tried BigBird
(google/bigbird-roberta-base
) by chance @ichenjia?
That doesn't solve this particular issue, but while we look into fixing it, I'm assuming your need is to handle longer sequence lengths than the typical Bert-like models are pre-trained on.
ahhh...
Longformer
is pretty chunky, that's for sure. Have you triedBigBird
(google/bigbird-roberta-base
) by chance @ichenjia?
Thanks! I have not tried it because it only supports Torch not TF right?
You are talking about https://huggingface.co/docs/transformers/v4.21.3/en/model_doc/big_bird#transformers.BigBirdForSequenceClassification
right?
yeah, you're right...I assumed the TF-flavor of BigBird would have been the easiest lift to implement, but maybe not. can you revert back @Rocketknight1's PR and run it again, but post the entire output/traceback so I can take a look @ichenjia?
EDIT: I mean use his PR again and try running your script again without explicitly making and passing in the global_attention_mask
and post the output/traceback here and I can probably get you a fix.
Thank you for trying to get to the bottom of it. Here is the code I ran:
from transformers import LongformerTokenizer, TFLongformerForSequenceClassification
import tensorflow as tf
import pickle
import numpy as np
from transformers import DefaultDataCollator
tf.data.experimental.enable_debug_mode()
tf.config.run_functions_eagerly(True)
import numpy as np
my_dict = {'text': ["random text 1", "randome text 2", "beautiful randome text 3"],
'label': [0,0,1]}
from datasets import Dataset
dataset = Dataset.from_dict(my_dict)
tokenizer = LongformerTokenizer.from_pretrained('allenai/longformer-base-4096')
def tokenize_function(examples):
r=tokenizer(examples["text"], padding="max_length", truncation=True)
#global_attention_masks=[[1]*len(r['attention_mask'][0])]*len(r['attention_mask'])
#r['global_attention_mask']=global_attention_masks
return r
tokenized_datasets = dataset.map(tokenize_function, batched=True)
data_collator = DefaultDataCollator(return_tensors="tf")
tf_train_dataset = tokenized_datasets.to_tf_dataset(
columns=["attention_mask", "input_ids", "token_type_ids", 'global_attention_mask'],
label_cols=["labels"],
shuffle=True,
collate_fn=data_collator,
batch_size=1
)
model = TFLongformerForSequenceClassification.from_pretrained('allenai/longformer-base-4096', num_labels=2)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=tf.metrics.SparseCategoricalAccuracy(),
)
model.fit(tf_train_dataset, batch_size=1)
and here is the track:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-fccdbd4c6c6d> in <module>
5 metrics=tf.metrics.SparseCategoricalAccuracy(),
6 )
----> 7 model.fit(tf_train_dataset, batch_size=1)
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1182 _r=1):
1183 callbacks.on_train_batch_begin(step)
-> 1184 tmp_logs = self.train_function(iterator)
1185 if data_handler.should_sync:
1186 context.async_wait()
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/keras/engine/training.py in train_function(iterator)
851 def train_function(iterator):
852 """Runs a training execution with one step."""
--> 853 return step_function(self, iterator)
854
855 else:
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/keras/engine/training.py in step_function(model, iterator)
840
841 data = next(iterator)
--> 842 outputs = model.distribute_strategy.run(run_step, args=(data,))
843 outputs = reduce_per_replica(
844 outputs, self.distribute_strategy, reduction='first')
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py in run(***failed resolving arguments***)
1284 fn = autograph.tf_convert(
1285 fn, autograph_ctx.control_status_ctx(), convert_by_default=False)
-> 1286 return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
1287
1288 def reduce(self, reduce_op, value, axis):
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py in call_for_each_replica(self, fn, args, kwargs)
2847 kwargs = {}
2848 with self._container_strategy().scope():
-> 2849 return self._call_for_each_replica(fn, args, kwargs)
2850
2851 def _call_for_each_replica(self, fn, args, kwargs):
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py in _call_for_each_replica(self, fn, args, kwargs)
3630 def _call_for_each_replica(self, fn, args, kwargs):
3631 with ReplicaContext(self._container_strategy(), replica_id_in_sync_group=0):
-> 3632 return fn(*args, **kwargs)
3633
3634 def _reduce_to(self, reduce_op, value, destinations, options):
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py in wrapper(*args, **kwargs)
595 def wrapper(*args, **kwargs):
596 with ag_ctx.ControlStatusCtx(status=ag_ctx.Status.UNSPECIFIED):
--> 597 return func(*args, **kwargs)
598
599 if inspect.isfunction(func) or inspect.ismethod(func):
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/keras/engine/training.py in run_step(data)
833
834 def run_step(data):
--> 835 outputs = model.train_step(data)
836 # Ensure counter is updated only if `train_step` succeeds.
837 with tf.control_dependencies(_minimum_control_deps(outputs)):
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/transformers/modeling_tf_utils.py in train_step(self, data)
1390 # Run forward pass.
1391 with tf.GradientTape() as tape:
-> 1392 y_pred = self(x, training=True)
1393 if self._using_dummy_loss:
1394 loss = self.compiled_loss(y_pred.loss, y_pred.loss, sample_weight, regularization_losses=self.losses)
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
1035 with autocast_variable.enable_auto_cast_variables(
1036 self._compute_dtype_object):
-> 1037 outputs = call_fn(inputs, *args, **kwargs)
1038
1039 if self._activity_regularizer:
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/transformers/modeling_tf_utils.py in run_call_with_unpacked_inputs(self, *args, **kwargs)
405
406 unpacked_inputs = input_processing(func, config, **fn_args_and_kwargs)
--> 407 return func(self, **unpacked_inputs)
408
409 # Keras enforces the first layer argument to be passed, and checks it through `inspect.getfullargspec()`. This
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/transformers/models/longformer/modeling_tf_longformer.py in call(self, input_ids, attention_mask, head_mask, token_type_ids, position_ids, global_attention_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict, labels, training)
2389 global_attention_mask = tf.convert_to_tensor(global_attention_mask, dtype=tf.int64)
2390 else:
-> 2391 global_attention_mask = tf.cast(global_attention_mask, tf.int64)
2392
2393 if global_attention_mask is None and input_ids is not None:
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
204 """Call target, and fall back on dispatchers if there is a TypeError."""
205 try:
--> 206 return target(*args, **kwargs)
207 except (TypeError, ValueError):
208 # Note: convert_to_eager_tensor currently raises a ValueError, not a
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py in cast(x, dtype, name)
986 # allows some conversions that cast() can't do, e.g. casting numbers to
987 # strings.
--> 988 x = ops.convert_to_tensor(x, name="x")
989 if x.dtype.base_dtype != base_type:
990 x = gen_math_ops.cast(x, base_type, name=name)
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py in wrapped(*args, **kwargs)
161 with Trace(trace_name, **trace_kwargs):
162 return func(*args, **kwargs)
--> 163 return func(*args, **kwargs)
164
165 return wrapped
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
1564
1565 if ret is None:
-> 1566 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
1567
1568 if ret is NotImplemented:
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref)
344 as_ref=False):
345 _ = as_ref
--> 346 return constant(v, dtype=dtype, name=name)
347
348
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name)
269 ValueError: if called on a symbolic tensor.
270 """
--> 271 return _constant_impl(value, dtype, shape, name, verify_shape=False,
272 allow_broadcast=True)
273
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
281 with trace.Trace("tf.constant"):
282 return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
--> 283 return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
284
285 g = ops.get_default_graph()
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
306 def _constant_eager_impl(ctx, value, dtype, shape, verify_shape):
307 """Creates a constant on the current device."""
--> 308 t = convert_to_eager_tensor(value, ctx, dtype)
309 if shape is None:
310 return t
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
104 dtype = dtypes.as_dtype(dtype).as_datatype_enum
105 ctx.ensure_initialized()
--> 106 return ops.EagerTensor(value, ctx.device_name, dtype)
107
108
ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor.
i don't think that's gonna fix the OOM error right?
Yeah that won't fix that OOM error, but I wanted to see the full stack to help track down what we can do to adjust the base PR to get you unblocked. I'm not at my comp right now but will take a look tomorrow and see how we can adjust to make it work.
Hi all, I made a bunch of edits and hopefully things should work more smoothly now! Let me know if the problems remain.
Thanks @Rocketknight1, much appreciated!
Can you try it again @ichenjia?
Sorry, I was busy yesterday. here is what I did:
pip install --upgrade git+https://github.com/huggingface/transformers.git@fix_tflongformer_int_dtype
Then ran the same code and still got the error. Did I install from the right branch?
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/transformers/models/longformer/modeling_tf_longformer.py in call(self, input_ids, attention_mask, head_mask, token_type_ids, position_ids, global_attention_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict, labels, training)
2389 global_attention_mask = tf.convert_to_tensor(global_attention_mask, dtype=tf.int64)
2390 else:
-> 2391 global_attention_mask = tf.cast(global_attention_mask, tf.int64)
2392
2393 if global_attention_mask is None and input_ids is not None:
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
204 """Call target, and fall back on dispatchers if there is a TypeError."""
205 try:
--> 206 return target(*args, **kwargs)
207 except (TypeError, ValueError):
208 # Note: convert_to_eager_tensor currently raises a ValueError, not a
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py in cast(x, dtype, name)
986 # allows some conversions that cast() can't do, e.g. casting numbers to
987 # strings.
--> 988 x = ops.convert_to_tensor(x, name="x")
989 if x.dtype.base_dtype != base_type:
990 x = gen_math_ops.cast(x, base_type, name=name)
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py in wrapped(*args, **kwargs)
161 with Trace(trace_name, **trace_kwargs):
162 return func(*args, **kwargs)
--> 163 return func(*args, **kwargs)
164
165 return wrapped
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
1564
1565 if ret is None:
-> 1566 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
1567
1568 if ret is NotImplemented:
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref)
344 as_ref=False):
345 _ = as_ref
--> 346 return constant(v, dtype=dtype, name=name)
347
348
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name)
269 ValueError: if called on a symbolic tensor.
270 """
--> 271 return _constant_impl(value, dtype, shape, name, verify_shape=False,
272 allow_broadcast=True)
273
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
281 with trace.Trace("tf.constant"):
282 return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
--> 283 return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
284
285 g = ops.get_default_graph()
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
306 def _constant_eager_impl(ctx, value, dtype, shape, verify_shape):
307 """Creates a constant on the current device."""
--> 308 t = convert_to_eager_tensor(value, ctx, dtype)
309 if shape is None:
310 return t
~/anaconda3/envs/tf_gpu/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
104 dtype = dtypes.as_dtype(dtype).as_datatype_enum
105 ctx.ensure_initialized()
--> 106 return ops.EagerTensor(value, ctx.device_name, dtype)
107
108
ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor.
Hi @ichenjia, the command you ran looks correct but the traceback you pasted refers to an old version of the code. (global_attention_mask = tf.cast(global_attention_mask, tf.int64)
is not on line 2391 anymore)
Can you try pip uninstall transformers
and then rerunning the command above, and then restarting any jupyter notebook servers you're running to make sure you're using the PR branch?
Hey all - I'm going to merge the PR with the fix so that it can be included in the next release of transformers
this week. However, if you have further problems, please reopen the issue and let me know!
Hello there!
First of all, I cannot thank @Rocketknight1 enough for the amazing work he has been doing to create
tensorflow
versions of the notebooks. On my side, I have spent some time and money (colab pro) trying to tie the notebooks together to create a full classifier from scratch with the following steps:Unfortunately, I run into two issues. You can use the fully working notebook pasted below.
First issue: by training my own tokenizer I actually get a
perplexity
(225) that is way worse than the example shown https://github.com/huggingface/notebooks/blob/new_tf_notebooks/examples/language_modeling-tf.ipynb when usingThis is puzzling as the tokenizer should be fine-tuned to the data used in the original tf2 notebook!
Second, there seem to be some python issue when I try to fine-tune the language model I obtained above with a text classification head.
Granted, the
tokenizer
and the underlyinglanguage model
have been trained on another dataset (the wikipedia dataset from the previous two tf2 notebook that is). See https://github.com/huggingface/notebooks/blob/new_tf_notebooks/examples/text_classification-tf.ipynb . However, I should at least get some valid output! Here the model is complaining about some collate function.Could you please have a look @sgugger @LysandreJik @Rocketknight1 when you can? I would be very happy to contribute this notebook to the Hugging Face community (although most of the credits go to @Rocketknight1). There is a great demand for building language models and NLP tasks from scratch.
Thanks!!!!
Code below
get the most recent versions
train tokenizer from scratch
causal language from scratch using my own tokenizer
mytokenizer
and fine tune a classification tasks
and now try to classify text
What do you think? Happy to help if I can Thanks!!