intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.62k stars 1.26k forks source link

[Nano] `RuntimeError: Inter op parallelism cannot be modified after initialization` when importing `Model` #5898

Open Oscilloscope98 opened 2 years ago

Oscilloscope98 commented 2 years ago

The problem occurred when importing Model from bigdl.nano.tf.keras after we have used tensorflow for creating datasets, etc.

Example problematic code:

import tensorflow as tf
import tensorflow_datasets as tfds

def create_datasets(img_size, batch_size):
    (train_ds, test_ds), info = tfds.load('imagenette/320px-v2',
                                          data_dir='/tmp/data',
                                          split=['train', 'validation'],
                                          with_info=True,
                                          as_supervised=True)

    num_classes = info.features['label'].num_classes

    def preprocessing(img, label):
        return tf.image.resize(img, (img_size, img_size)), \
               tf.one_hot(label, num_classes)

    train_ds = train_ds.repeat().map(preprocessing).batch(batch_size)
    test_ds = test_ds.map(preprocessing).batch(batch_size)
    return train_ds, test_ds, info

train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)

from bigdl.nano.tf.keras import Model # <= error occurs here

Error messages:

RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_31179/1368016327.py in <module>
     22 train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)
     23 
---> 24 from bigdl.nano.tf.keras import Model # <= error occurs here

~/miniconda3/envs/temp-tf/lib/python3.7/site-packages/bigdl/nano/tf/__init__.py in <module>
     20     import tensorflow as tf
     21     if "NANO_TF_INTER_OP" in os.environ:
---> 22         tf.config.threading.set_inter_op_parallelism_threads(int(os.environ["NANO_TF_INTER_OP"]))
     23     else:
     24         warnings.warn("NANO_TF_INTER_OP not found the in os.environ, "

~/miniconda3/envs/temp-tf/lib/python3.7/site-packages/tensorflow/python/framework/config.py in set_inter_op_parallelism_threads(num_threads)
    146     num_threads: Number of parallel threads
    147   """
--> 148   context.context().inter_op_parallelism_threads = num_threads
    149 
    150 

~/miniconda3/envs/temp-tf/lib/python3.7/site-packages/tensorflow/python/eager/context.py in inter_op_parallelism_threads(self, num_threads)
   1749     if self._context_handle is not None:
   1750       raise RuntimeError(
-> 1751           "Inter op parallelism cannot be modified after initialization.")
   1752 
   1753     self._inter_op_parallelism_threads = num_threads

RuntimeError: Inter op parallelism cannot be modified after initialization.

Environment:

bigdl-nano                   2.1.0b20220918
intel_tensorflow             2.7.0
tensorflow-datasets          4.6.0
tensorflow-estimator         2.7.0
tensorflow-io-gcs-filesystem 0.27.0
tensorflow-metadata          1.10.0

Similar problems happened when importing bigdl.nano.tf.keras.layers.Embedding, bigdl.nano.tf.optimizers.SparseAdam, etc.

Please refer here for more information: https://github.com/intel-analytics/BigDL/pull/5836#issuecomment-1254200961

yangw1234 commented 2 years ago

created a tensorflow link here: https://github.com/tensorflow/tensorflow/issues/57812

yangw1234 commented 2 years ago

add known issues here https://github.com/intel-analytics/BigDL/pull/5923

yangw1234 commented 2 years ago

Hi @Oscilloscope98 , could you help verify that https://github.com/intel-analytics/BigDL/pull/5923/files fixed the problem. I found a way to reset the eager session context.

image

Oscilloscope98 commented 2 years ago

@yangw1234 When added the changes in #5923 , the RuntimeError: Inter op parallelism cannot be modified after initialization. disappears. But for the following codes, new error occurs. Example code:

import tensorflow as tf
import tensorflow_datasets as tfds

def create_datasets(img_size, batch_size):
    (train_ds, test_ds), info = tfds.load('imagenette/320px-v2',
                                          data_dir='/tmp/data',
                                          split=['train', 'validation'],
                                          with_info=True,
                                          as_supervised=True)

    num_classes = info.features['label'].num_classes

    def preprocessing(img, label):
        return tf.image.resize(img, (img_size, img_size)), \
               tf.one_hot(label, num_classes)

    train_ds = train_ds.repeat().map(preprocessing).batch(batch_size)
    test_ds = test_ds.map(preprocessing).batch(batch_size)
    return train_ds, test_ds, info

train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)

from bigdl.nano.tf.keras import Model # <= error occurs here

Error:

Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f073d813d40>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
    context.remove_function(self.name)
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
    context().remove_function(name)
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
    pywrap_tfe.TFE_ContextRemoveFunction(self._handle, name)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_flat_map_read_one_file_20'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f073d813d40>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
    context.remove_function(self.name)
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
    context().remove_function(name)
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
    pywrap_tfe.TFE_ContextRemoveFunction(self._handle, name)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_flat_map_read_one_file_83'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f073d813d40>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_interleave_classfunctools.partial_189'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f073d813d40>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_map_parse_and_decode_218'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f073d813d40>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_map_lookup_nest_226'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f073d813d40>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_map_preprocessing_309'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f073d813d40>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_interleave_classfunctools.partial_252'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f073d813d40>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_map_parse_and_decode_281'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f073d813d40>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_map_lookup_nest_289'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f073d813d40>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_map_preprocessing_328'.

Although it throws the above exception this time, the following training processes can be successfully run:

import tensorflow as tf
import tensorflow_datasets as tfds

def create_datasets(img_size, batch_size):
    (train_ds, test_ds), info = tfds.load('imagenette/320px-v2',
                                          data_dir='/tmp/data',
                                          split=['train', 'validation'],
                                          with_info=True,
                                          as_supervised=True)

    num_classes = info.features['label'].num_classes

    def preprocessing(img, label):
        return tf.image.resize(img, (img_size, img_size)), \
               tf.one_hot(label, num_classes)

    train_ds = train_ds.repeat().map(preprocessing).batch(batch_size)
    test_ds = test_ds.map(preprocessing).batch(batch_size)
    return train_ds, test_ds, info

train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)

from bigdl.nano.tf.keras import Model # <= our Model is imported here with above exception
# but the following code is successfully executed

from tensorflow.keras import layers
from tensorflow.keras.applications import ResNet50

def define_model_inputs_outputs(num_classes, img_size):
    inputs = tf.keras.layers.Input(shape=(img_size, img_size, 3))
    x = tf.cast(inputs, tf.float32)
    x = tf.keras.applications.resnet50.preprocess_input(x)
    backbone = ResNet50(weights='imagenet')
    backbone.trainable = False
    x = backbone(x)
    x = layers.Dense(512, activation='relu')(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)

    return inputs, outputs

inputs, outputs = define_model_inputs_outputs(num_classes=ds_info.features['label'].num_classes, 
                                              img_size=224)

model = Model(inputs=inputs, outputs=outputs)
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])

model.fit(train_ds,
          epochs=1,
          steps_per_epoch=(ds_info.splits['train'].num_examples // 32),
          num_processes=2)

Full running log:

2022-09-26 14:25:49.056577: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-26 14:25:49.056980: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f4663d05cb0>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
    context.remove_function(self.name)
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
    context().remove_function(name)
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
    pywrap_tfe.TFE_ContextRemoveFunction(self._handle, name)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_flat_map_read_one_file_20'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f4663d05cb0>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
    context.remove_function(self.name)
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
    context().remove_function(name)
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
    pywrap_tfe.TFE_ContextRemoveFunction(self._handle, name)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_flat_map_read_one_file_83'.
2022-09-26 14:25:59.803221: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/keras/engine/functional.py:1410: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
  layer_config = serialize_layer_fn(layer)
/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/keras/saving/saved_model/layer_serialization.py:112: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
  return generic_utils.serialize_keras_object(obj)
2022-09-26 14:26:11.930268: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-26 14:26:11.933234: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job worker -> {0 -> localhost:55938, 1 -> localhost:49198}
2022-09-26 14:26:11.933382: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:427] Started server with target: grpc://localhost:55938
2022-09-26 14:26:11.987982: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-26 14:26:11.990847: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job worker -> {0 -> localhost:55938, 1 -> localhost:49198}
2022-09-26 14:26:11.990989: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:427] Started server with target: grpc://localhost:49198
2022-09-26 14:26:22.556611: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:537] The `assert_cardinality` transformation is currently not handled by the auto-shard rewrite and will be removed.
2022-09-26 14:26:22.558011: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:537] The `assert_cardinality` transformation is currently not handled by the auto-shard rewrite and will be removed.
2022-09-26 14:26:22.619641: W tensorflow/core/framework/dataset.cc:744] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
2022-09-26 14:26:22.619763: W tensorflow/core/framework/dataset.cc:744] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
295/295 [==============================] - 325s 1s/step - loss: 0.5200 - accuracy: 0.9636
2022-09-26 14:31:55.091407: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
2022-09-26 14:31:55.199643: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/keras/engine/functional.py:1410: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
  layer_config = serialize_layer_fn(layer)
/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/keras/engine/functional.py:1410: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
  layer_config = serialize_layer_fn(layer)
/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/keras/saving/saved_model/layer_serialization.py:112: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
  return generic_utils.serialize_keras_object(obj)
/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/keras/saving/saved_model/layer_serialization.py:112: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
  return generic_utils.serialize_keras_object(obj)
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f4663d05cb0>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_interleave_classfunctools.partial_189'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f4663d05cb0>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_map_parse_and_decode_218'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f4663d05cb0>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_map_lookup_nest_226'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f4663d05cb0>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_map_preprocessing_309'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f4663d05cb0>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_interleave_classfunctools.partial_252'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f4663d05cb0>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_map_parse_and_decode_281'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f4663d05cb0>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_map_lookup_nest_289'.
Exception ignored in: <function _EagerDefinedFunctionDeleter.__del__ at 0x7f4663d05cb0>
Traceback (most recent call last):
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 414, in __del__
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 2584, in remove_function
  File "/home/yuwen/miniconda3/envs/temp2/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 1287, in remove_function
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to remove non-existent function '__inference_Dataset_map_preprocessing_328'.
Oscilloscope98 commented 2 years ago

@yangw1234 When testing the following code for Embedding and SparseAdam, I also met the following errors: Example code:

import re
import string
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.layers import TextVectorization

    (raw_train_ds, raw_val_ds, raw_test_ds), info = tfds.load(
        "imdb_reviews",
        data_dir="/tmp/data",
        split=['train[:80%]', 'train[80%:]', 'test'],
        as_supervised=True,
        batch_size=32,
        with_info=True
    )

    def custom_standardization(input_data):
        lowercase = tf.strings.lower(input_data)
        stripped_html = tf.strings.regex_replace(lowercase, "<br />", " ")
        return tf.strings.regex_replace(
            stripped_html, f"[{re.escape(string.punctuation)}]", ""
        )

    vectorize_layer = TextVectorization(
        standardize=custom_standardization,
        max_tokens=20000,
        output_mode="int",
        output_sequence_length=500,
    )

    text_ds = raw_train_ds.map(lambda x, y: x)
    vectorize_layer.adapt(text_ds)

    def vectorize_text(text, label):
        text = tf.expand_dims(text, -1)
        return vectorize_layer(text), label

    # vectorize the data
    train_ds = raw_train_ds.map(vectorize_text)
    val_ds = raw_val_ds.map(vectorize_text)
    test_ds = raw_test_ds.map(vectorize_text)

    return train_ds, val_ds, test_ds

train_ds, val_ds, test_ds = create_datasets()

inputs = tf.keras.Input(shape=(None,), dtype="int64")

from bigdl.nano.tf.keras.layers import Embedding # import Embedding here
x = Embedding(input_dim=20000, output_dim=128)(inputs)

from tensorflow.keras import layers
from bigdl.nano.tf.keras import Model # import Model here

def make_backbone():
    inputs = tf.keras.Input(shape=(None, 128))
    x = layers.Dropout(0.5)(inputs)
    x = layers.Conv1D(128, 7, padding="valid", activation="relu", strides=3)(x)
    x = layers.Conv1D(128, 7, padding="valid", activation="relu", strides=3)(x)
    x = layers.GlobalMaxPooling1D()(x)
    x = layers.Dense(128, activation="relu")(x)
    x = layers.Dropout(0.5)(x)
    predictions = layers.Dense(1, activation="sigmoid", name="predictions")(x)

    model = Model(inputs, predictions)
    return model

from bigdl.nano.tf.optimizers import SparseAdam #import SparseAdam here
predictions = make_backbone()(x)
model = Model(inputs, predictions)

model.compile(loss="binary_crossentropy", optimizer=SparseAdam(), metrics=["accuracy"])
model.fit(train_ds, validation_data=val_ds, epochs=1) # <= error occurs here
model.evaluate(test_ds)

Error:

2022-09-26 15:07:11.929913: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at lookup_table_op.cc:911 : NOT_FOUND: Resource localhost/1576/N10tensorflow6lookup15LookupInterfaceE does not exist.
Traceback (most recent call last):
  File "test/test2.py", line 74, in <module>
    model.fit(train_ds, validation_data=val_ds, epochs=1)
  File "/home/yuwen/BigDL/python/nano/src/bigdl/nano/tf/keras/training_utils.py", line 118, in fit
    return self.fit_old(**fit_kwargs)
  File "/home/yuwen/miniconda3/envs/temp3/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/yuwen/miniconda3/envs/temp3/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.NotFoundError:  Resource localhost/1576/N10tensorflow6lookup15LookupInterfaceE does not exist.
         [[{{node text_vectorization/string_lookup/None_Lookup/LookupTableFindV2}}]]
         [[IteratorGetNext]] [Op:__inference_train_function_2897]

which avoids the fit function to run successfully.

yangw1234 commented 2 years ago

Maybe we should ask user to add import bigdl.nano.tf at the top of their main file.