tensorflow / privacy

Library for training machine learning models with privacy for training data
Apache License 2.0
1.9k stars 443 forks source link

Incompatible with hub.KerasLayer #543

Closed hguan6 closed 7 months ago

hguan6 commented 7 months ago

Platform: Linux TensorFlow version: 2.8.1 TensorFlow Privacy version: 0.8.0 My code:

embed_url = "https://tfhub.dev/google/Wiki-words-500/2"
w2v_layer = hub.KerasLayer(
      embed_url, input_shape=[], dtype=tf.string, trainable=trainable
)

model = DPSequential(l2_norm_clip=1.0, noise_multiplier=1.1)
model.add(w2v_layer)
model.add(layers.Dense(16, activation="relu"))
model.add(layers.Dense(1, activation="sigmoid"))

The code works perfectly well when I used the line model = tf.keras.Sequential() instead of model = DPSequential(l2_norm_clip=1.0, noise_multiplier=1.1)

When I used using DPSequential and GPU for training, I got this error

2023-11-27 11:35:21.698262: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-27 11:35:22.224200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20658 MB memory:  -> device: 0, name: NVIDIA A10, pci bus id: 0000:ca:00.0, compute capability: 8.6
2023-11-27 11:35:22.591087: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 2018750000 exceeds 10% of free system memory.
Epoch 1/5
WARNING:tensorflow:Using a while_loop for converting StringSplit
WARNING:tensorflow:Using a while_loop for converting StringSplit
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting StringToHashBucketFast
WARNING:tensorflow:Using a while_loop for converting StringToHashBucketFast
WARNING:tensorflow:Using a while_loop for converting SparseFillEmptyRows
WARNING:tensorflow:Using a while_loop for converting SparseFillEmptyRows
WARNING:tensorflow:Using a while_loop for converting Unique
WARNING:tensorflow:Using a while_loop for converting Unique
WARNING:tensorflow:Using a while_loop for converting ResourceGather
WARNING:tensorflow:Using a while_loop for converting ResourceGather
WARNING:tensorflow:Using a while_loop for converting SparseSegmentSqrtN
WARNING:tensorflow:Using a while_loop for converting SparseSegmentSqrtN
WARNING:tensorflow:Using a while_loop for converting StringSplit
WARNING:tensorflow:Using a while_loop for converting StringSplit
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting StringToHashBucketFast
WARNING:tensorflow:Using a while_loop for converting StringToHashBucketFast
WARNING:tensorflow:Using a while_loop for converting SparseFillEmptyRows
WARNING:tensorflow:Using a while_loop for converting SparseFillEmptyRows
WARNING:tensorflow:Using a while_loop for converting Unique
WARNING:tensorflow:Using a while_loop for converting Unique
WARNING:tensorflow:Using a while_loop for converting ResourceGather
WARNING:tensorflow:Using a while_loop for converting ResourceGather
WARNING:tensorflow:Using a while_loop for converting SparseSegmentSqrtN
WARNING:tensorflow:Using a while_loop for converting SparseSegmentSqrtN
2023-11-27 11:35:28.468623: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at xla_ops.cc:241 : INVALID_ARGUMENT: Trying to access resource hash_table_/tmp/tmpPJ1wRH/tokens.txt_-2_-1_load_1_12 located in device /job:localhost/replica:0/task:0/device:CPU:0 from device /job:localhost/replica:0/task:0/device:GPU:0
Traceback (most recent call last):
  File "/home/local/ASUAD/hguan6/Model_Deduplication_Train_Text_Classification_Model/model_trainer_sentiment_task.py", line 158, in <module>
    main()
  File "/home/local/ASUAD/hguan6/Model_Deduplication_Train_Text_Classification_Model/model_trainer_sentiment_task.py", line 144, in main
    model.fit(
  File "/home/local/ASUAD/hguan6/miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/local/ASUAD/hguan6/miniconda3/envs/model_dedup/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Detected at node 'StatefulPartitionedCall' defined at (most recent call last):
    File "/home/local/ASUAD/hguan6/Model_Deduplication_Train_Text_Classification_Model/model_trainer_sentiment_task.py", line 158, in <module>
      main()
    File "/home/local/ASUAD/hguan6/Model_Deduplication_Train_Text_Classification_Model/model_trainer_sentiment_task.py", line 144, in main
      model.fit(
    File "/home/local/ASUAD/hguan6/miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/home/local/ASUAD/hguan6/miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/engine/training.py", line 1384, in fit
      tmp_logs = self.train_function(iterator)
    File "/home/local/ASUAD/hguan6/miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/engine/training.py", line 1021, in train_function
      return step_function(self, iterator)
    File "/home/local/ASUAD/hguan6/miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/engine/training.py", line 1010, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/local/ASUAD/hguan6/miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/engine/training.py", line 1000, in run_step
      outputs = model.train_step(data)
Node: 'StatefulPartitionedCall'
Trying to access resource hash_table_/tmp/tmpPJ1wRH/tokens.txt_-2_-1_load_1_12 located in device /job:localhost/replica:0/task:0/device:CPU:0 from device /job:localhost/replica:0/task:0/device:GPU:0
         [[{{node StatefulPartitionedCall}}]] [Op:__inference_train_function_5501]

When I used CPU for training, I got

2023-11-27 11:39:31.506582: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2023-11-27 11:39:31.506623: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: en4217519l
2023-11-27 11:39:31.506631: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: en4217519l
2023-11-27 11:39:31.507183: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 515.43.4
2023-11-27 11:39:31.507210: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 515.43.4
2023-11-27 11:39:31.507218: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 515.43.4
2023-11-27 11:39:31.508100: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-27 11:39:31.602036: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 2018750000 exceeds 10% of free system memory.
Epoch 1/5
WARNING:tensorflow:Using a while_loop for converting StringSplit
WARNING:tensorflow:Using a while_loop for converting StringSplit
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting StringToHashBucketFast
WARNING:tensorflow:Using a while_loop for converting StringToHashBucketFast
WARNING:tensorflow:Using a while_loop for converting SparseFillEmptyRows
WARNING:tensorflow:Using a while_loop for converting SparseFillEmptyRows
WARNING:tensorflow:Using a while_loop for converting Unique
WARNING:tensorflow:Using a while_loop for converting Unique
WARNING:tensorflow:Using a while_loop for converting ResourceGather
WARNING:tensorflow:Using a while_loop for converting ResourceGather
WARNING:tensorflow:Using a while_loop for converting SparseSegmentSqrtN
WARNING:tensorflow:Using a while_loop for converting SparseSegmentSqrtN
WARNING:tensorflow:Using a while_loop for converting StringSplit
WARNING:tensorflow:Using a while_loop for converting StringSplit
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableFindV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting LookupTableSizeV2
WARNING:tensorflow:Using a while_loop for converting StringToHashBucketFast
WARNING:tensorflow:Using a while_loop for converting StringToHashBucketFast
WARNING:tensorflow:Using a while_loop for converting SparseFillEmptyRows
WARNING:tensorflow:Using a while_loop for converting SparseFillEmptyRows
WARNING:tensorflow:Using a while_loop for converting Unique
WARNING:tensorflow:Using a while_loop for converting Unique
WARNING:tensorflow:Using a while_loop for converting ResourceGather
WARNING:tensorflow:Using a while_loop for converting ResourceGather
WARNING:tensorflow:Using a while_loop for converting SparseSegmentSqrtN
WARNING:tensorflow:Using a while_loop for converting SparseSegmentSqrtN
2023-11-27 11:39:37.224042: I tensorflow/compiler/xla/service/service.cc:171] XLA service 0x7f53b01efdb0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-11-27 11:39:37.224100: I tensorflow/compiler/xla/service/service.cc:179]   StreamExecutor device (0): Host, Default Version
2023-11-27 11:39:37.637389: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at xla_ops.cc:248 : INVALID_ARGUMENT: Detected unsupported operations when trying to compile graph __inference_train_step_5448[] on XLA_CPU_JIT: _Arg (No registered '_Arg' OpKernel for XLA_CPU_JIT devices compatible with node {{node data}}
         (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_STRING, _output_shapes=[[32]], _user_specified_name="data", index=0){{node data}}
The op is created at: 
File "Model_Deduplication_Train_Text_Classification_Model/model_trainer_sentiment_task.py", line 158, in <module>
  main()
File "Model_Deduplication_Train_Text_Classification_Model/model_trainer_sentiment_task.py", line 144, in main
  model.fit(
File "miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
  return fn(*args, **kwargs)
File "miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/engine/training.py", line 1384, in fit
  tmp_logs = self.train_function(iterator)
File "miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/engine/training.py", line 1021, in train_function
  return step_function(self, iterator)
File "miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/engine/training.py", line 1010, in step_function
  outputs = model.distribute_strategy.run(run_step, args=(data,))
File "miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/engine/training.py", line 1000, in run_step
  outputs = model.train_step(data)
Traceback (most recent call last):
  File "/home/local/ASUAD/hguan6/Model_Deduplication_Train_Text_Classification_Model/model_trainer_sentiment_task.py", line 158, in <module>
    main()
  File "/home/local/ASUAD/hguan6/Model_Deduplication_Train_Text_Classification_Model/model_trainer_sentiment_task.py", line 144, in main
    model.fit(
  File "/home/local/ASUAD/hguan6/miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/local/ASUAD/hguan6/miniconda3/envs/model_dedup/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Detected unsupported operations when trying to compile graph __inference_train_step_5448[] on XLA_CPU_JIT: _Arg (No registered '_Arg' OpKernel for XLA_CPU_JIT devices compatible with node {{node data}}
         (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_STRING, _output_shapes=[[32]], _user_specified_name="data", index=0){{node data}}
The op is created at: 
File "Model_Deduplication_Train_Text_Classification_Model/model_trainer_sentiment_task.py", line 158, in <module>
  main()
File "Model_Deduplication_Train_Text_Classification_Model/model_trainer_sentiment_task.py", line 144, in main
  model.fit(
File "miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
  return fn(*args, **kwargs)
File "miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/engine/training.py", line 1384, in fit
  tmp_logs = self.train_function(iterator)
File "miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/engine/training.py", line 1021, in train_function
  return step_function(self, iterator)
File "miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/engine/training.py", line 1010, in step_function
  outputs = model.distribute_strategy.run(run_step, args=(data,))
File "miniconda3/envs/model_dedup/lib/python3.9/site-packages/keras/engine/training.py", line 1000, in run_step
  outputs = model.train_step(data)
         [[StatefulPartitionedCall]] [Op:__inference_train_function_5501]

Maybe it is because of hub.KerasLayer includes a preprocessing layer, an embedding layer, and embedding lookup layer (https://www.kaggle.com/models/google/wiki-words/frameworks/tensorFlow2). I also tried other approaches, such as a normal Keras model with DP optimizer and custom training loop, but still didn't make it work. Is there a workaround for this hub layer?

hguan6 commented 7 months ago

I found that using the normal Keras model with DPKerasAdamOptimizer works well.