Create Synthetic Data blueprint fails on differential privacy training

MustafaS993 commented 3 years ago

I'm trying to train a dataset on the Create Synthetic Data blueprint. In the config template I have set "dp": True.

I get this error:

100%|██████████| 67453/67453 [00:01<00:00, 50095.59it/s]
WARNING dp_model.py: Experimental: Differentially private training enabled
WARNING dp_model.py: ******* Patching TensorFlow to utilize new Keras code paths, see: https://github.com/tensorflow/tensorflow/issues/44917 *******
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (64, None, 256)           5120000   
_________________________________________________________________
dropout (Dropout)            (64, None, 256)           0         
_________________________________________________________________
lstm (LSTM)                  (64, None, 256)           525312    
_________________________________________________________________
dropout_1 (Dropout)          (64, None, 256)           0         
_________________________________________________________________
lstm_1 (LSTM)                (64, None, 256)           525312    
_________________________________________________________________
dropout_2 (Dropout)          (64, None, 256)           0         
_________________________________________________________________
dense (Dense)                (64, None, 20000)         5140000   
=================================================================
Total params: 11,310,624
Trainable params: 11,310,624
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/100
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/gretel_synthetics/tensorflow/train.py in train_rnn(params)
    235                   callbacks=_callbacks,
--> 236                   validation_data=validation_dataset
    237                   )

15 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1182               callbacks.on_train_batch_begin(step)
-> 1183               tmp_logs = self.train_function(iterator)
   1184               if data_handler.should_sync:

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    888       with OptionalXlaContext(self._jit_compile):
--> 889         result = self._call(*args, **kwds)
    890 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    932       initializers = []
--> 933       self._initialize(args, kwds, add_initializers_to=initializers)
    934     finally:

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in _initialize(self, args, kwds, add_initializers_to)
    763         self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
--> 764             *args, **kwds))
    765 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
   3049     with self._lock:
-> 3050       graph_function, _ = self._maybe_define_function(args, kwargs)
   3051     return graph_function

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
   3443           self._function_cache.missed.add(call_context_key)
-> 3444           graph_function = self._create_graph_function(args, kwargs)
   3445           self._function_cache.primary[cache_key] = graph_function

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
   3288             override_flat_arg_shapes=override_flat_arg_shapes,
-> 3289             capture_by_value=self._capture_by_value),
   3290         self._function_attributes,

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    998 
--> 999       func_outputs = python_func(*func_args, **func_kwargs)
   1000 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
    671         with OptionalXlaContext(compile_with_xla):
--> 672           out = weak_wrapped_fn().__wrapped__(*args, **kwds)
    673         return out

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    985             if hasattr(e, "ag_error_metadata"):
--> 986               raise e.ag_error_metadata.to_exception(e)
    987             else:

ValueError: in user code:

    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:855 train_function  *
        return step_function(self, iterator)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:845 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:1285 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:2833 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3608 _call_for_each_replica
        return fn(*args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:838 run_step  **
        outputs = model.train_step(data)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:799 train_step
        self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:529 minimize
        loss, var_list=var_list, grad_loss=grad_loss, tape=tape)
    /usr/local/lib/python3.7/dist-packages/tensorflow_privacy/privacy/optimizers/dp_optimizer_keras.py:88 _compute_gradients
        tf.reshape(loss, [self._num_microbatches, -1]), axis=1)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py:206 wrapper
        return target(*args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/array_ops.py:195 reshape
        result = gen_array_ops.reshape(tensor, shape, name)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/gen_array_ops.py:8398 reshape
        "Reshape", tensor=tensor, shape=shape, name=name)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/op_def_library.py:750 _apply_op_helper
        attrs=attr_protos, op_def=op_def)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py:601 _create_op_internal
        compute_device)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py:3565 _create_op_internal
        op_def=op_def)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py:2042 __init__
        control_input_ops, op_def)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py:1883 _create_c_op
        raise ValueError(str(e))

    ValueError: Dimension size must be evenly divisible by 64 but is 1 for '{{node Reshape}} = Reshape[T=DT_FLOAT, Tshape=DT_INT32](loss/weighted_loss/value, Reshape/shape)' with input shapes: [], [2] and with input tensors computed as partial shapes: input[1] = [64,?].

From my understanding, the issue is from the tensors being processed by the differential privacy algorithm. I can train without issue when "dp": False, also worth knowing that I could train with "dp": True on same blueprint notebook without issue 3 months ago.

zredlined commented 3 years ago

@MustafaS993 that looks like a bug on our end. Both dp_microbatches and batch_size are set to 64 when dp=True, but the number of dp_microbatches needs to to be less than batch_size and divide evenly into it. We'll patch this in the next release. In the meantime, you can adjust your config.

Check out some examples here: https://gretel.ai/blog/practical-privacy-with-synthetic-data

Try using this config as a start for your DP settings. We are planning to add a differential privacy template as a default config soon

config_template = {
    "checkpoint_dir": checkpoint_dir,
    "vocab_size": 0, # lower vocab size helps dp
    "epochs": 50,
    "early_stopping": True,
    "learning_rate": 0.001,
    "rnn_units": 256,
    "batch_size": 4,
    "predict_batch_size": 1,
    "dp": True,
    "dp_noise_multiplier": 0.001, 
    "dp_l2_norm_clip": 10,
    "dp_microbatches": 1,    
    "overwrite": True
}

MustafaS993 commented 3 years ago

@zredlined Thanks for the quick reply, your config template has got thing running again! I will also check out the blog post for more information about the parameters.

gretelai / gretel-blueprints

Create Synthetic Data blueprint fails on differential privacy training #40