sintel-dev / Orion

A machine learning library for detecting anomalies in signals.
https://sintel.dev/Orion/
MIT License
1.02k stars 158 forks source link

Increase in Training Time for TadGAN implemented in TensorFlow 2.x #293

Closed lcwong0928 closed 1 year ago

lcwong0928 commented 2 years ago

Description

There is a significant increase in training time per signal between TadGAN implemented in TensorFlow 2.x and TensorFlow 1.x. The main differences between the two environments are the methodology to compute the gradient penalty loss and tf.GradientTape() to compile the model.

What I Did

The first model (TF 2.0-1) is based on the Wasserstein GAN (WGAN) with Gradient Penalty (GP) tutorial that uses tf.GradientTape() for the train_step and a second-order tf.GradientTape() for the gradient penalty loss. The second model (TF 2.0-2) compiles the model similar to the TensorFlow 1.x version but still uses tf.GradientTape() for the gradient penalty loss.

The following table reports the average training time across all signals (on GPU). image

sarahmish commented 2 years ago

Thank you @lcwong0928 for the analysis!

Would it be possible to make a CPU comparison between TF1.0 and TF2.0-2?

lcwong0928 commented 2 years ago

Yes, will run a benchmark for the CPU version.

sarahmish commented 2 years ago

Quick comparison of memory consumption between TF1 and TF2=2.3.4 tadgan

# TF1 TF2 w/ GT
initial 236444 271056
1 3746264 5616512
2 4137908 6289976
3 4284160 6709156
4 4465656 6897888
5 4487228 7031916
6 4506316 7189728
7 4562440 7333748
8 4731936 7428804
9 4731936 7446516
10 4736356 7554516
sarahmish commented 1 year ago

PR #281 was merged.