spmallick / learnopencv

Learn OpenCV : C++ and Python Examples
https://www.learnopencv.com/
21.38k stars 11.63k forks source link

VAE_Cartoon_Tensorflow Training #822

Open ReidTPowell opened 1 year ago

ReidTPowell commented 1 year ago

training VAE fails with the following error:

UnboundLocalError: local variable 'kl_loss' referenced before assignment

Any thoughts or suggestion on how to resolve this error would be greatly appreciated.

RohitDhankar commented 1 year ago

Apparently the loss variable is being accessed in code before it's getting initialised

ReidTPowell commented 1 year ago

Thanks for the prompt response, and sorry for the vague description of the error. The full trace back is:


UnboundLocalError Traceback (most recent call last) Cell In[36], line 1 ----> 1 train(normalized_ds, 30)

Cell In[34], line 8, in train(dataset, epochs) 6 for image_batch in dataset: 7 i += 1 ----> 8 loss = train_step(imagebatch) 9 #loss.append(loss) 10 11 #print("Loss",np.mean(loss_)) 12 seed = image_batch[:25]

File C:\Anaconda3\envs\tensorflow29\lib\site-packages\tensorflow\python\util\traceback_utils.py:153, in filter_traceback..error_handler(*args, **kwargs) 151 except Exception as e: 152 filtered_tb = _process_traceback_frames(e.traceback) --> 153 raise e.with_traceback(filtered_tb) from None 154 finally: 155 del filtered_tb

File ~\AppData\Local\Temp__autograph_generated_file3_wyyupw.py:14, in outer_factory..inner_factory..tftrain_step(images) 12 latent = ag__.converted_call(ag.ld(final), ([ag.ld(mean), ag.ld(log_var)],), None, fscope) 13 generated_images = ag.converted_call(ag.ld(dec), (ag.ld(latent),), dict(training=True), fscope) ---> 14 loss = ag__.converted_call(ag.ld(vae_loss), (ag.ld(images), ag.ld(generated_images), ag.ld(mean), ag.ld(log_var)), None, fscope) 15 gradients_of_enc = ag.converted_call(ag.ld(encoder).gradient, (ag.ld(loss), ag__.ld(enc).trainable_variables), None, fscope) 16 gradients_of_dec = ag.converted_call(ag.ld(decoder).gradient, (ag.ld(loss), ag__.ld(dec).trainable_variables), None, fscope)

File ~\AppData\Local\Temp__autograph_generated_filem_niztwx.py:11, in outer_factory..inner_factory..tfvae_loss(y_true, ypred, mean, var) 9 retval = ag.UndefinedReturnValue() 10 r_loss = ag.converted_call(ag__.ld(mse_loss), (ag.ld(y_true), ag.ld(y_pred)), None, fscope) ---> 11 kl_loss = ag.converted_call(ag__.ld(kl_loss), (ag.ld(mean), ag.ld(log_var)), None, fscope) 12 try: 13 do_return = True

UnboundLocalError: in user code:

File "C:\Users\rpowell\AppData\Local\Temp\ipykernel_793660\144731484.py", line 11, in train_step  *
    loss = vae_loss(images, generated_images, mean, log_var)
File "C:\Users\rpowell\AppData\Local\Temp\ipykernel_793660\2804652454.py", line 11, in vae_loss  *
    kl_loss = kl_loss(mean, log_var)

UnboundLocalError: local variable 'kl_loss' referenced before assignment

I did run the following block before training: def mse_loss(y_true, y_pred): r_loss = K.mean(K.square(y_true - y_pred), axis = [1,2,3]) return 1000 * r_loss

def kl_loss(mean, log_var): kl_loss = -0.5 * K.sum(1 + log_var - K.square(mean) - K.exp(log_var), axis = 1) return kl_loss

def vae_loss(y_true, y_pred, mean, var): r_loss = mse_loss(y_true, y_pred) kl_loss = kl_loss(mean, log_var) return r_loss + kl_loss

and I could comment out the k1_loss in the vae_loss function to get it to run. So I am thinking something is wrong with the function, but I am admittingly niave in this space.

Reid T. Powell, PhD | Research Assistant Professor Center for Translational Cancer Research, Institute of Biosciences & Technology | Texas A&M University 2121 W. Holcombe Blvd. Rm 911 | Houston, TX 77030 ph: 713.677.7474 | fax: 713.677.7474 | @.**@.>


From: Rohit Dhankar @.> Sent: Sunday, May 21, 2023 9:38 PM To: spmallick/learnopencv @.> Cc: Powell, Reid T @.>; Author @.> Subject: Re: [spmallick/learnopencv] VAE_Cartoon_Tensorflow Training (Issue #822)

Apparently the loss variable is being accessed in code before it's getting initialised — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread. Message ID: spmallick/learnopencv/issues/822/1556432382@ github. com ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Apparently the loss variable is being accessed in code before it's getting initialised

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/spmallick/learnopencv/issues/822*issuecomment-1556432382__;Iw!!KwNVnqRv!EiPJ-fvYlMTWVLsNlFk_XrS_NgJVV46eqVWGKt5ofG47Rnb_P6pxHIGx_dbkcpGZAswWxaMpG1X8-5C-hIMMyG07MA$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/APOLEEYEAQHMLVYHRYDDPE3XHLGSJANCNFSM6AAAAAAYJUXIPA__;!!KwNVnqRv!EiPJ-fvYlMTWVLsNlFk_XrS_NgJVV46eqVWGKt5ofG47Rnb_P6pxHIGx_dbkcpGZAswWxaMpG1X8-5C-hIMowv-lUQ$. You are receiving this because you authored the thread.Message ID: @.***>

RohitDhankar commented 1 year ago

Please do correct me if im wrong ... i may end up confusing you then providing a solution

Presuming you are using this code -- VAE_Cartoon_TensorFlow.ipynb

Where are you getting this bit of code ...probably from a TEMP FILE of KERAS or TF.Autograph origin thats letting us know that the Notebook cells you have run have missed a continuous Kernel run ... So while it needs that variable value -- kl_loss , its not getting that within the -- Autograph Code -- https://www.tensorflow.org/api_docs/python/tf/autograph

File ~\AppData\Local\Temp\__autograph_generated_filem_niztwx.py:11, in outer_factory.<locals>.inner_factory.<locals>.tf__vae_loss(y_true, y_pred, mean, var) 9 retval_ = ag__.UndefinedReturnValue() 10 r_loss = ag__.converted_call(ag__.ld(mse_loss), (ag__.ld(y_true), ag__.ld(y_pred)), None, fscope) ---> 11 kl_loss = ag__.converted_call(ag__.ld(kl_loss), (ag__.ld(mean), ag__.ld(log_var)), None, fscope)

ReidTPowell commented 1 year ago

I think that may be the issue. When installing the requirements tf-nightly-gpu got depreciated so I ended up installing tensorflow==2.6.2. From the documentation in the Autograph Code link it looks like those functions are in a different place after tf2.0…

Reid T. Powell, PhD | Research Assistant Professor Center for Translational Cancer Research, Institute of Biosciences & Technology | Texas A&M University 2121 W. Holcombe Blvd. Rm 911 | Houston, TX 77030 ph: 713.677.7474 | fax: 713.677.7474 | @.**@.>

From: Rohit Dhankar @.> Sent: Monday, May 22, 2023 2:13 PM To: spmallick/learnopencv @.> Cc: Powell, Reid T @.>; Author @.> Subject: Re: [spmallick/learnopencv] VAE_Cartoon_Tensorflow Training (Issue #822)

Please do correct me if im wrong .. . i may end up confusing you then providing a solution Presuming you are using this code -- VAE_Cartoon_TensorFlow. ipynb Where are you getting this bit of code .. . probably from a TEMP FILE of KERAS or TF. Autograph ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd

Please do correct me if im wrong ... i may end up confusing you then providing a solution

Presuming you are using this code -- VAE_Cartoon_TensorFlow.ipynbhttps://urldefense.com/v3/__https:/github.com/spmallick/learnopencv/blob/master/Variational-Autoencoder-TensorFlow/VAE_Cartoon_TensorFlow.ipynb__;!!KwNVnqRv!He6-MTe5RtUe6LK_aj3IySmhezX5LEvHky596PEn8VAz_PX7IQjHBUpgbkKF9-yRNT0X_gRjKdLbGLaFwSIW4evyow$

Where are you getting this bit of code ...probably from a TEMP FILE of KERAS or TF.Autograph origin thats letting us know that the Notebook cells you have run have missed a continuous Kernel run ... So while it needs that variable value -- kl_loss , its not getting that within the -- Autograph Code -- https://www.tensorflow.org/api_docs/python/tf/autographhttps://urldefense.com/v3/__https:/www.tensorflow.org/api_docs/python/tf/autograph__;!!KwNVnqRv!He6-MTe5RtUe6LK_aj3IySmhezX5LEvHky596PEn8VAz_PX7IQjHBUpgbkKF9-yRNT0X_gRjKdLbGLaFwSIiSL9_Qw$

File ~\AppData\Local\Temp__autograph_generated_filem_niztwx.py:11, in outer_factory..inner_factory..tfvae_loss(y_true, ypred, mean, var) 9 retval = ag.UndefinedReturnValue() 10 r_loss = ag.converted_call(ag__.ld(mse_loss), (ag.ld(y_true), ag.ld(y_pred)), None, fscope) ---> 11 kl_loss = ag.converted_call(ag__.ld(kl_loss), (ag.ld(mean), ag.ld(log_var)), None, fscope)

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/spmallick/learnopencv/issues/822*issuecomment-1557785172__;Iw!!KwNVnqRv!He6-MTe5RtUe6LK_aj3IySmhezX5LEvHky596PEn8VAz_PX7IQjHBUpgbkKF9-yRNT0X_gRjKdLbGLaFwSI0zOEoNA$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/APOLEE26A6NUVBDDBCTOQW3XHO3C5ANCNFSM6AAAAAAYJUXIPA__;!!KwNVnqRv!He6-MTe5RtUe6LK_aj3IySmhezX5LEvHky596PEn8VAz_PX7IQjHBUpgbkKF9-yRNT0X_gRjKdLbGLaFwSJoByBzdg$. You are receiving this because you authored the thread.Message ID: @.**@.>>